Integrations
OpenAI Integration
Integrate Fold with the OpenAI SDK for automatic context optimization. Works with GPT-4o, GPT-4 Turbo, and all OpenAI models.
Installation
pnpm add @fold/sdkQuick Start
The fastest way to add context optimization to your OpenAI calls:
import OpenAI from 'openai'
import { foldMessages } from '@fold/sdk/openai'
const openai = new OpenAI()
const optimize = foldMessages({ budget: 100_000 })
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: optimize(messages), // Automatically optimized!
})
console.log(optimize.saved())
// { tokens: 12000, percent: 62, cost: 0.12 }Client Wrapper
For automatic optimization without changing your code, wrap the OpenAI client:
import OpenAI from 'openai'
import { wrapOpenAI } from '@fold/sdk/openai'
// Wrap once at initialization
const openai = wrapOpenAI(new OpenAI(), { budget: 100_000 })
// Use exactly like normal - optimization happens automatically
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: longConversation,
})
// Access savings via the fold property
console.log(openai.fold.saved())
// { tokens: 45000, percent: 68, cost: 0.45 }Agent Loop Pattern
For ReAct-style agents with tool calling, use the full fold() API:
import OpenAI from 'openai'
import { fold } from '@fold/sdk'
const openai = new OpenAI()
const ctx = fold("coding") // 100K budget, 15 turn window
ctx.system("You are a coding assistant with access to tools.")
const tools = [
{
type: "function",
function: {
name: "read_file",
description: "Read a file from the filesystem",
parameters: {
type: "object",
properties: {
path: { type: "string", description: "File path" }
},
required: ["path"]
}
}
},
// ... more tools
]
// Agent loop
while (true) {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: ctx.messages(), // Optimized!
tools,
})
const message = response.choices[0].message
// Handle tool calls
if (message.tool_calls?.length) {
for (const call of message.tool_calls) {
// Track the action
ctx.act(JSON.parse(call.function.arguments), call.function.name)
// Execute the tool
const result = await executeTool(call.function.name, call.function.arguments)
// Track the result
ctx.observe(result, call.function.name)
}
} else {
// Track reasoning
ctx.think(message.content)
}
// Check for stop signals (loops, failures, goal achieved)
if (ctx.stop()) {
console.log("Stopping:", ctx.reason())
break
}
}
// Final savings report
console.log(ctx.saved())
// { tokens: 45000, percent: 68, cost: 0.45 }Streaming Support
Both optimization methods work with streaming responses:
import OpenAI from 'openai'
import { foldMessages } from '@fold/sdk/openai'
const openai = new OpenAI()
const optimize = foldMessages({ budget: 100_000 })
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: optimize(messages),
stream: true,
})
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || ''
process.stdout.write(content)
}Anthropic SDK
The same optimization works with the Anthropic SDK (Claude models):
import Anthropic from '@anthropic-ai/sdk'
import { foldAnthropicMessages } from '@fold/sdk/openai'
const anthropic = new Anthropic()
const optimize = foldAnthropicMessages({ budget: 100_000 })
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-5-20250929',
max_tokens: 1024,
messages: optimize(messages),
})
console.log(optimize.saved())Configuration Options
budget
Maximum token budget. Context will be optimized to stay under this limit.
model
Model name for accurate tokenization. Defaults to "gpt-4o".
window
Number of recent turns to keep fully unmasked. Older turns are optimized.
const optimize = foldMessages({
budget: 100_000, // Token budget
model: 'gpt-4o', // For tokenization
window: 15, // Keep last 15 turns full
})Best Practices
Set a realistic budget
Leave room for the model's response. If your model's context window is 128K, set budget to ~100K to leave space for output tokens.
Use presets for common patterns
fold("coding") for coding agents, fold("chat") for conversational AI.Monitor your savings
Call
optimize.saved() regularly to track how much you're saving. Log this data for cost analysis.