POSITRON

Positron
Now Available
AtlasAtlas
Atlas

Atlas

Transformer Inference Server

  • 4x Performance per Watt versus GPUs

  • 2.5x Performance per Dollar vs H100

Now Available
CloudCloud
Cloud

Cloud

Managed Transformer Inference

  • High performance low latency model inference.

Mixtral 8x7B tokens per second

(per user)

01

TPS / User @ Batch 1

Positron Atlas

4.4X PERFORMANCE FOR 50% OF THE COST
328

Nvidia DGX-H100

74
02

TPS / User @ Batch 32

Positron Atlas

4.9X PERFORMANCE FOR 50% OF THE COST
267

Nvidia DGX-H100

55

Positron Performance and Efficiency Advantages in Software V1.x

August 2024
September 2024
Software Release
Models Benchmarked
Relative
Performance
Performance
per Watt
Advantage
Performance
per $
Advantage
Confidence
V1.0
Mixtral 8x7B
0.65*
2.1
1.5
Measured
V1.1
Mixtral 8x7B
Llama 3.1 70B
1.1*
3.9
2.6
In-dev, measured.
* Nvidia performance is based on vLLM 0.5.4 for both Mixtral 8x7B, Llama 3.1 8B, and Llama 3.1 70B.

Every Transformer Runs on Positron

Supports all Transformer models seamlessly with zero time and zero effort

Model Deployment on Positron in 4 Easy Steps

Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use

  • Develop or procure a model using the HuggingFace Transformers Library.

  • Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.

  • Update client applications to use Positron’s OpenAI API-compliant endpoint.

  • Issue API requests and receive the best performance.

GCS

GCS

Amazon S3

Amazon S3

Files

.pt

.safetensors

Drag & Drop to uploadorBROWSE FILES

“mistralai/Mixtral-8x7B-Instruct-v0.1”

Hugging Face
Positron

Rest API { }

Model Manager

Model Loader

HF Model Fetcher

from openai import OpenAI
client = OpenAI(uri="api.positron.ai")

client.chat.completions
.create(
model="mixtral8x7b"
)

OpenAI-compatible

Python client

Increased density for power-constrained racks

Based on V1.1 power and performance.

Mixtral 8x7B Performance
DGX-H100
Atlas α
Aggregate Tokens per Second (TPS)
744
4400
Number of Users
8
40
DGX-H100
5,900 W
Atlas α
Atlas α
Atlas α
Atlas α
Atlas α

Upcoming events

AI Hardware and Edge AI Summit
November 10, 2024AI Hardware and Edge AI SummitSignia By Hilton, San Jose, CAGo to event →
NeurIPS 2024
January 18, 2025NeurIPS 2024Vancouver Convention Centre, Vancouver, CanadaGo to event →
Positron Developer Meetup
December 8, 2024Positron Developer MeetupReplicate HQ, San Francisco, CAGo to event →
Go to events