Exxa logo
Exxa name

Optimize LLM Ops

One task. One model.

Optimize LLM layers by continuously leveraging the most adapted model for each task performed, to decrease LLM costs and improve latency. When relevant, leverage past logs to finetune a smaller model in a few clicks.



Join Beta waitlist

Solution Key benefits

Up to 90%

Costs reduction
leveraging most adapted,
smaller, finetuned models

Standardize practices

Latency improvements
by leveraging smaller and faster models

Standardize practices

CO2 emissions reductions by switching to more energy-efficient models (impacts being quantified)

Solution Overview A multitude of AI models with different capabilities are available. You do not always need the largest and most performing ones. EXXA facilitates the selection of the most adapted model for each task.

Solution How it works Exxa developed a breakthrough solution to parallelize deployment of multiple finetuned models on a shared GPU.

Monitoring SDK
1

INSTALLATION

Install EXXA solution in a few lines of code and start collecting existing LLM logs (ex. GPT-4, Claude 2.1)

Costs
Latency
Duration
# Tokens
Toxicity
Users satisfaction
Distance to running model
And more
2

MONITOR & PREDICT

Define metrics and monitor current LLM performance. Our solution automatically predicts performance of alternatives (pre-trained and finetuned models).

Parallel adapters
3

FINETUNE

Finetune smaller models in a few clicks by leveraging past logs.

Private deployment
4

EVALUATE

Evaluate and compare alternatives to existing models based on predefined metrics (incl. A/B Testing, manual and automatic evaluations).

Parallel adapters
5

AUTOMATIC DEPLOYMENT

Replace existing model with optimized model in one click from EXXA platform.

EXXA Why us?

Standardize practices

Fast

Integrate EXXA solution in a few minutes. Simply change some lines of code to start collecting data.

Optimize GPU usage

GDPR compliant

When using EXXA cloud, we safeguard your data in Europe and ensure GDPR compliance

Competitive Open-source LLM

Private

Option for private deployment of our solution on your VPC (Virtual Private Cloud) or on-premise infrastructure.

Pricing Plans

Monitoring

Free


  • Monitoring & Analytics
  • Optimization predictions
  • Up to 25K logs collected

Pro

20€ / model
+ per token rates


  • 20€ per live model per month
  • Pay per token for finetuning
    (see below)
  • Pay per token for inference
    (see below)
  • Up to 10 finetuned models

Entreprise

Custom


  • Unlimited finetuned models
  • Custom support and development
  • Custom SLA
  • Option for private deployment

Pricing Per-token rates

Base model Finetuning Input Output
Mistral 7B €4.0 / M tokens €1.1 / M tokens €1.5 / M tokens
Llama-2 7B €4.0 / M tokens €1.1 / M tokens €1.5 / M tokens
Llama-2 13B €8.0 / M tokens €2.2 / M tokens €3.0 / M tokens
GPT 3.5 Turbo $8.0 / M tokens $3.0 / M tokens $6.0 / M tokens
Note: GPT 3.5 Turbo rates are similar to the ones charged by OpenAI.
Note 2: Prices as of March 2024, subject to change in the future, not contractual.

Solution Private Deployment Deploy EXXA solution on your on-premise infrastructure or VPC (Virtual Private Cloud). And benefit from EXXA breakthrough solution to parallelize the deployment of multiple finetuned models on a shared GPU.

Private deployment

Privately deploy and finetune any open-source model

Deploy any open-source model on your on-premise infrastructure or VPC (Virtual Private Cloud) to meet highest standards of data security and confidentiality.

Parallel adapters

Deploy up to 100 finetuned models on a single GPU

Shift from inefficient deployment of finetuned models on dedicated GPU to a parallelized deployment on a single GPU.

Want to know more?

Get in touch