POSITRON
Atlas
Transformer Inference Server
4x Performance per Watt versus GPUs
2.5x Performance per Dollar vs H100
Cloud
Managed Transformer Inference
High performance low latency model inference.
Performance versus H100 (Tokens per Second, Mixtral 8x7B)
Positron Release 1.1
Positron Atlas
Nvidia DGX-H100
Positron Release 2.0
Positron Atlas
Nvidia DGX-H100
Positron Performance and Efficiency Advantages in Software V1.x
Performance
per Watt
Advantage
per $
Advantage
Llama 3.1 70B
Every Transformer Runs on Positron
Supports all Transformer models seamlessly with zero time and zero effort
Model Deployment on Positron in 4 Easy Steps
Positron maps any trained HuggingFace Transformers Library model directly onto hardware for maximum performance and ease of use
Develop or procure a model using the HuggingFace Transformers Library.
Upload or link trained model file (.pt or .safetensors) to Positron Model Manager.
Update client applications to use Positron’s OpenAI API-compliant endpoint.
Issue API requests and receive the best performance.
GCS
Amazon S3
.pt
.safetensors
“mistralai/Mixtral-8x7B-Instruct-v0.1”
Rest API { }
Model Manager
Model Loader
HF Model Fetcher
client = OpenAI(uri="api.positron.ai")
client.chat.completions
OpenAI-compatible
Python client