Introducing
Prompt Caching


EXXA is excited to announce the launch of prompt caching, a powerful feature to enhance the cost-effectiveness of our AI inference services. With prompt caching, developers can now store and reuse context between API calls within a same batch.

calendar Oct 08, 2024
duration 2 min
duration EXXA team

What is prompt caching?

Prompt caching allows you to store large portions of your prompts, such as background information, instructions, or example outputs, and reuse them across multiple API calls. This feature is particularly beneficial for applications that require consistent context or extensive background knowledge.

When to use prompt caching?

Prompt caching is useful in many scenarios, particularly:

How it work?

Prompt caching is automatically applied to all your requests within the same batch. We automatically detect if part of a request can be cached and reused for other requests. We manage it in two steps:

Pricing

The main advantage of prompt caching is that it allows you to reduce the costs of requests sharing the same context. Here are the pricing details:

How we compare with other providers

Start using it today!

To start using prompt caching, you need to:

Contact us if you have any questions or want access to additional models at: founders@withexxa.com.

Get started