Optimize LLM layers by continuously leveraging the most adapted model for each task performed, to decrease LLM costs and improve latency. When relevant, leverage past logs to finetune a smaller model in a few clicks.
Costs reduction
leveraging most adapted,
smaller, finetuned models
Latency improvements
by leveraging smaller and faster models
CO2 emissions reductions by switching to more energy-efficient models (impacts being quantified)
Install EXXA solution in a few lines of code and start collecting existing LLM logs (ex. GPT-4, Claude 2.1)
Define metrics and monitor current LLM performance. Our solution automatically predicts performance of alternatives (pre-trained and finetuned models).
Finetune smaller models in a few clicks by leveraging past logs.
Evaluate and compare alternatives to existing models based on predefined metrics (incl. A/B Testing, manual and automatic evaluations).
Replace existing model with optimized model in one click from EXXA platform.
Integrate EXXA solution in a few minutes. Simply change some lines of code to start collecting data.
When using EXXA cloud, we safeguard your data in Europe and ensure GDPR compliance
Option for private deployment of our solution on your VPC (Virtual Private Cloud) or on-premise infrastructure.
Base model | Finetuning | Input | Output |
Mistral 7B | €4.0 / M tokens | €1.1 / M tokens | €1.5 / M tokens |
Llama-2 7B | €4.0 / M tokens | €1.1 / M tokens | €1.5 / M tokens |
Llama-2 13B | €8.0 / M tokens | €2.2 / M tokens | €3.0 / M tokens |
GPT 3.5 Turbo | $8.0 / M tokens | $3.0 / M tokens | $6.0 / M tokens |
Deploy any open-source model on your on-premise infrastructure or VPC (Virtual Private Cloud) to meet highest standards of data security and confidentiality.
Shift from inefficient deployment of finetuned models on dedicated GPU to a parallelized deployment on a single GPU.