Off-Peak Computing is the most cost-efficient batch inference API for open-source models Get high quality output, at cheapest price, for all use cases when you can tolerate some delay!
Get all your
requests under
than 24 hours.
Often much faster!
Per million tokens
for Llama-3.3-70b-Instruct FP16
($0.30/0.50 input/ouput per million tokens)
Hard rate
limit
E.g. Use most performing models
as a judge to evaluate the generation
of smaller models
E.g. Use most performing models to generate large amount of synthetic data
to train new models
E.g. Classify large datasets
of documents, customer feedbacks
or news documents
on a daily basis
E.g. Translate large volume
of texts on multiple languages
using high performing models like llama-3-70B
E.g. Extract data from large documents in a specific format. Use structured output feature on EXXA API.
E.g. Synthetize customers or internal chatbot conversations on a daily basis to reduce storage requirements.