POSITRON

Positron
Now Available
AtlasAtlas
Atlas

Atlas

Transformer Inference Server

  • 4x Performance per Watt versus GPUs

  • 2.5x Performance per Dollar vs H100

Now Available
CloudCloud
Cloud

Cloud

Managed Transformer Inference

  • High performance low latency model inference.

Performance versus H100 (Tokens per Second, Mixtral 8x7B)

01

Positron Release 1.1

Positron Atlas

3.8X PERFORMANCE FOR 1/3 THE COST
110

Nvidia DGX-H100

93
02

Positron Release 2.0

Positron Atlas

5.2X PERFORMANCE FOR 1/4 THE COST
165

Nvidia DGX-H100

93

Positron Performance and Efficiency Advantages in Software V1.x

August 2024
September 2024
Software Release
Models Benchmarked
Relative
Performance
Performance
per Watt
Advantage
Performance
per $
Advantage
Confidence
V1.0
Mixtral 8x7B
0.65*
2.1
1.5
Measured
V1.1
Mixtral 8x7B
Llama 3.1 70B
1.1*
3.9
2.6
In-dev, measured.
* Nvidia performance is based on vLLM 0.5.4 for both Mixtral 8x7B, Llama 3.1 8B, and Llama 3.1 70B.

Every Transformer Runs on Positron

Supports all Transformer models seamlessly with zero time and zero effort

Model Deployment on Positron in 4 Easy Steps

Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use

  • Develop or procure a model using the HuggingFace Transformers Library.

  • Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.

  • Update client applications to use Positron’s OpenAI API-compliant endpoint.

  • Issue API requests and receive the best performance.

GCS

GCS

Amazon S3

Amazon S3

Files

.pt

.safetensors

Drag & Drop to uploadorBROWSE FILES

“mistralai/Mixtral-8x7B-Instruct-v0.1”

Hugging Face
Positron

Rest API { }

Model Manager

Model Loader

HF Model Fetcher

from openai import OpenAI
client = OpenAI(uri="api.positron.ai")

client.chat.completions
.create(
model="mixtral8x7b"
)

OpenAI-compatible

Python client

Increased density for power-constrained racks

Based on V1.1 power and performance.

Mixtral 8x7B Performance
DGX-H100
Atlas α
Aggregate Tokens per Second (TPS)
744
4400
Number of Users
8
40
DGX-H100
5,900 W
Atlas α
Atlas α
Atlas α
Atlas α
Atlas α

Upcoming events

AI Hardware and Edge AI Summit
September 10, 2024AI Hardware and Edge AI SummitSignia By Hilton, San Jose, CAGo to event →
NeurIPS 2024
December 9, 2024NeurIPS 2024Vancouver Convention Centre, Vancouver, CanadaGo to event →
Go to events