A Cache For Your LLM

Butter is a cache that identifies patterns in LLM responses and saves you money by serving responses directly.

It's also deterministic, allowing your AI systems to consistently repeat past behaviors.

Chat Completions Compatible

Butter is a Chat Completions API endpoint, making it easy to drop right into favorite tools like LangChain, Mastra, Crew AI, Pydantic AI, AI Suite, Helicone, LiteLLM, Martian, Browser Use, DSPy, and more.


from openai import OpenAI

# Repoint your client
client = OpenAI(
    base_url="https://proxy.butter.dev/v1", 
)

# Requests now route through Butter
response = client.chat.completions.create()

Who It's For

Butter works for autonomous agents which use tools to perform repeat work, often back-office tasks like data entry, computer use, and research.

Priced For You

We charge 5% of what we save you on your token bill.

(but it's free for now)