Positron | Asimov

Custom AI Accelerator Silicon

Up to 2.3TB Memory per Chip
5x Tokens per Dollar vs NVIDIA Rubin
5x Tokens per Watt vs NVIDIA Rubin
400W TDP with support for air-cooling

CONTACT SALES

Coming in 2027

A Different Architecture for a Different Problem

Transformer inference is memory-bound, not compute-bound. The bottleneck isn't how many operations you can perform—it's how fast you can feed data to those operations. Asimov is designed from first principles around this reality.

Key Specifications

Per Chip Memory Capacity864GB to 2.3TB

Realizable Memory Bandwidth2.76 TB/s

Clock Frequency2.0 GHz

TDP~400W

CoolingAir-Cooled

DatatypesTF32/BF16/FP16/FP8/NVFP4/INT4

Host InterfacePCIe Gen 6 x32 with CXL

Chip-to-Chip Interconnect16 Tbps

Chips per Systemup to 16,384

Memory-First Design

Most AI accelerators maximize theoretical compute (FLOPs), then add memory as an afterthought. Asimov inverts this: we designed around memory bandwidth and capacity first, with compute balanced to match. The result is over 90% realized memory bandwidth on real Transformer workloads—versus under 30% for GPUs running the same models.

LPDDR Over HBM

We chose commodity LPDDR5x over High Bandwidth Memory. HBM delivers impressive peak bandwidth on paper, but at extreme cost, power, and supply chain risk. Our architecture achieves comparable realized bandwidth from LPDDR—while delivering 6x the memory capacity per chip at dramatically lower system cost.

Dual-Hemisphere Architecture

Two identical hemispheres can operate independently on separate workloads or collaboratively on larger problems. Each hemisphere has its own memory subsystem, enabling efficient scaling from single-chip deployments to multi-chip Titan systems without architectural compromises.

TransWarp Engine

At the heart of Asimov is a 512×128 systolic array running at 2 GHz, with weight memory co-located at each processing element. This architecture minimizes data movement—the dominant source of power consumption and latency in AI inference. The array reconfigures dynamically: 512×128 for FFN Matrix-Matrix Multiplication (GEMM), 128×512 for memory bound Matrix-Vector Multiplication (GEMV) attention computation.

Streaming Vector Acceleration

Dedicated hardware performs softmax, RMSNorm, RoPE, SwiGLU, and other activation functions at line rate—no kernel launches, no memory round-trips. Your model runs as a continuous pipeline where vectors flow through matrix ops, normalization, and nonlinearities without ever stalling for the CPU. New activation functions can be supported without silicon changes.

On-Chip General Purpose CPUs

Multiple on-chip ARMv9 64-bit general purpose processor cores handle workload orchestration and provide a programmable escape hatch for frontier model operations that don't fit standard patterns. Run custom logic when you need it, but keep the common path in dedicated hardware for deterministic latency and maximum throughput.

Host Interface

PCIe Gen 6 with CXL support delivers 128 GB/s per hemisphere. The host handles tokenization and sampling; Asimov owns everything in between. Asimov's independent operating capabiltiies may make you think the host processor would be bored, but a fast host interface enables massive distributed KV cache management, multi model loading, and more without slowing down.

Scale-Out Interconnect

16 Tbps of direct chip-to-chip bandwidth with no switches or NICs required. Point-to-point links scale to 16,384 chips in ring, torus, mesh, and more topologies for tensor parallelism, pipeline parallelism, and MoE expert parallelism.

Asimov