A Cache For Your LLM
Butter is a cache that identifies patterns in LLM responses and saves you money by serving responses directly.
It's also deterministic, allowing your AI systems to consistently repeat past behaviors.
It's live — try it out here.
Chat Completions Compatible
Butter is a Chat Completions API endpoint, making it easy to drop right into favorite tools like LangChain, Mastra, Crew AI, Pydantic AI, AI Suite, Helicone, LiteLLM, Martian, Browser Use, DSPy, and more.
from openai import OpenAI
# Repoint your client
client = OpenAI(
base_url="https://proxy.butter.dev/v1",
)
# Requests now route through Butter
response = client.chat.completions.create()
Who It's For
Butter works for autonomous agents which use tools to perform repeat work, often back-office tasks like data entry, computer use, and research.
Priced For You
We charge 5% of what we save you on your token bill.
(but it's free for now)