Exxa logo
Exxa name

Reduce the
financial & environmental
costs of LLMs

Optimize LLM layers by continuously leveraging the most adapted model for each task performed, to reduce LLM costs and CO2 emissions. When relevant, leverage past logs to finetune a smaller model in a few clicks.

Join Beta waitlist

Solution Key benefits

Up to 90%

Costs reduction
leveraging most adapted,
smaller, finetuned models

Standardize practices

Latency improvements
by leveraging smaller and faster models

Standardize practices

CO2 emissions reductions by switching to more energy-efficient models (impacts being quantified)

Solution Overview A multitude of AI models with different capabilities are available. You do not always need the largest and most performing ones. EXXA facilitates the selection of the most adapted model for each task.

Solution How it works

Monitoring SDK


Install EXXA solution in a few lines of code and start collecting existing LLM logs (ex. GPT-4, Claude 2.1)

# Tokens
Users satisfaction
Distance to running model
And more


Define metrics and monitor current LLM performance. Our solution automatically predicts performance of alternatives (pre-trained and finetuned models).

Parallel adapters


Finetune smaller models in a few clicks by leveraging past logs.

Private deployment


Evaluate and compare alternatives to existing models based on predefined metrics (incl. A/B Testing, manual and automatic evaluations).

EXXA Why us?

Standardize practices


Integrate EXXA solution in a few minutes.
Simply change some lines
of code to start collecting data.

Competitive Open-source LLM


Option for private deployment of our solution on your VPC (Virtual Private Cloud) or on-premise infrastructure.

Pricing Plans



  • Monitoring & Analytics
  • Optimization predictions
  • Up to 25K logs collected


Per token rates

  • Pay per token for finetuning
    (see below)
  • Pay per token for inference
    (see below)
  • Up to 10 finetuned models



  • Unlimited finetuned models
  • Custom support and development
  • Custom SLA
  • Option for private deployment

Pricing Per-token rates

Base model Finetuning Input Output
Mistral 7B $4.0 / M tokens $0.5 / M tokens $0.5 / M tokens
Llama-3 8B $4.0 / M tokens $0.5 / M tokens $0.5 / M tokens
Mixtral 8x7B $8.0 / M tokens $1.2 / M tokens $1.2 / M tokens
GPT 3.5 Turbo $8.0 / M tokens $3.0 / M tokens $6.0 / M tokens
Note: GPT 3.5 Turbo rates are similar to the ones charged by OpenAI.
Note 2: Prices as of June 2024, subject to change in the future, not contractual.

Solution Private Deployment Deploy EXXA solution on your on-premise infrastructure or VPC (Virtual Private Cloud). And benefit from EXXA breakthrough solution to parallelize the deployment of multiple finetuned models on a shared GPU.

Private deployment

Privately deploy and finetune any open-source model

Deploy any open-source model on your on-premise infrastructure or VPC (Virtual Private Cloud) to meet highest standards of data security and confidentiality.

Parallel adapters

Deploy up to 100 finetuned models on a single GPU

Shift from inefficient deployment of finetuned models on dedicated GPU to a parallelized deployment on a single GPU.

Want to know more?

Get in touch