Positron Performance and Efficiency Advantages in Software V1.x
August 2024
September 2024
Software Release
Models Benchmarked
Relative
Performance
Performance
Performance
per Watt
Advantage
per Watt
Advantage
Performance
per $
Advantage
per $
Advantage
Confidence
V1.0
Mixtral 8x7B
0.65*
2.1
1.5
Measured
V1.1
Mixtral 8x7B
Llama 3.1 70B
Llama 3.1 70B
1.1*
3.9
2.6
In-dev, measured.
* Nvidia performance is based on vLLM 0.5.4 for both Mixtral 8x7B, Llama 3.1 8B, and Llama 3.1 70B.
Software & Systems Overview
Chat interface example
OpenAI compatible LLM API
Load balancer and scheduler
Transformer engine
Configurable accelerator
with field updates
Switch
System SWServer
Atlas
Atlas
Atlas
Atlas
Atlas
Atlas
Atlas
Atlas
Positron Atlas Hardware
Network
Scale-Up IOTransformer engine
Sys MemHost
CPU
CPU
AI Math
Accelerator
Accelerator
Mem
Every Transformer Runs on Positron
Supports all Transformer models seamlessly with zero time and zero effort
Model Deployment on Positron in 4 Easy Steps
Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use
Develop or procure a model using the HuggingFace Transformers Library.
Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.
Update client applications to use Positron’s OpenAI API-compliant endpoint.
Issue API requests and receive the best performance.
GCS
Amazon S3
.pt
.safetensors
Drag & Drop to uploadorBROWSE FILES
“mistralai/Mixtral-8x7B-Instruct-v0.1”
Rest API { }
Model Manager
Model Loader
HF Model Fetcher
from openai import OpenAI
client = OpenAI(uri="api.positron.ai")
client.chat.completions
client = OpenAI(uri="api.positron.ai")
client.chat.completions
.create(
…
model="mixtral8x7b"
)
OpenAI-compatible
Python client
01
02
03