EXXA is excited to
announce the launch of prompt caching,
a powerful feature to enhance the cost-effectiveness of our AI inference services.
With prompt caching, developers can now store and reuse context between API calls within a same batch.
Prompt caching allows you to store large portions of your prompts, such as background information, instructions, or example outputs, and reuse them across multiple API calls. This feature is particularly beneficial for applications that require consistent context or extensive background knowledge.
Prompt caching is useful in many scenarios, particularly:
Prompt caching is automatically applied to all your requests within the same batch. We automatically detect if part of a request can be cached and reused for other requests. We manage it in two steps:
The main advantage of prompt caching is that it allows you to reduce the costs of requests sharing the same context. Here are the pricing details:
To start using prompt caching, you need to: