Integrations

OpenAI Integration

Integrate Fold with the OpenAI SDK for automatic context optimization. Works with GPT-4o, GPT-4 Turbo, and all OpenAI models.

Installation

pnpm add @fold/sdk

Quick Start

The fastest way to add context optimization to your OpenAI calls:

import OpenAI from 'openai'
import { foldMessages } from '@fold/sdk/openai'

const openai = new OpenAI()
const optimize = foldMessages({ budget: 100_000 })

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: optimize(messages),  // Automatically optimized!
})

console.log(optimize.saved())
// { tokens: 12000, percent: 62, cost: 0.12 }

Client Wrapper

For automatic optimization without changing your code, wrap the OpenAI client:

import OpenAI from 'openai'
import { wrapOpenAI } from '@fold/sdk/openai'

// Wrap once at initialization
const openai = wrapOpenAI(new OpenAI(), { budget: 100_000 })

// Use exactly like normal - optimization happens automatically
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: longConversation,
})

// Access savings via the fold property
console.log(openai.fold.saved())
// { tokens: 45000, percent: 68, cost: 0.45 }

Agent Loop Pattern

For ReAct-style agents with tool calling, use the full fold() API:

import OpenAI from 'openai'
import { fold } from '@fold/sdk'

const openai = new OpenAI()
const ctx = fold("coding")  // 100K budget, 15 turn window

ctx.system("You are a coding assistant with access to tools.")

const tools = [
  {
    type: "function",
    function: {
      name: "read_file",
      description: "Read a file from the filesystem",
      parameters: {
        type: "object",
        properties: {
          path: { type: "string", description: "File path" }
        },
        required: ["path"]
      }
    }
  },
  // ... more tools
]

// Agent loop
while (true) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: ctx.messages(),  // Optimized!
    tools,
  })

  const message = response.choices[0].message

  // Handle tool calls
  if (message.tool_calls?.length) {
    for (const call of message.tool_calls) {
      // Track the action
      ctx.act(JSON.parse(call.function.arguments), call.function.name)

      // Execute the tool
      const result = await executeTool(call.function.name, call.function.arguments)

      // Track the result
      ctx.observe(result, call.function.name)
    }
  } else {
    // Track reasoning
    ctx.think(message.content)
  }

  // Check for stop signals (loops, failures, goal achieved)
  if (ctx.stop()) {
    console.log("Stopping:", ctx.reason())
    break
  }
}

// Final savings report
console.log(ctx.saved())
// { tokens: 45000, percent: 68, cost: 0.45 }

Streaming Support

Both optimization methods work with streaming responses:

import OpenAI from 'openai'
import { foldMessages } from '@fold/sdk/openai'

const openai = new OpenAI()
const optimize = foldMessages({ budget: 100_000 })

const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: optimize(messages),
  stream: true,
})

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || ''
  process.stdout.write(content)
}

Anthropic SDK

The same optimization works with the Anthropic SDK (Claude models):

import Anthropic from '@anthropic-ai/sdk'
import { foldAnthropicMessages } from '@fold/sdk/openai'

const anthropic = new Anthropic()
const optimize = foldAnthropicMessages({ budget: 100_000 })

const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-5-20250929',
  max_tokens: 1024,
  messages: optimize(messages),
})

console.log(optimize.saved())

Configuration Options

budget

Maximum token budget. Context will be optimized to stay under this limit.

model

Model name for accurate tokenization. Defaults to "gpt-4o".

window

Number of recent turns to keep fully unmasked. Older turns are optimized.

const optimize = foldMessages({
  budget: 100_000,    // Token budget
  model: 'gpt-4o',    // For tokenization
  window: 15,         // Keep last 15 turns full
})

Best Practices

Set a realistic budget

Leave room for the model's response. If your model's context window is 128K, set budget to ~100K to leave space for output tokens.

Use presets for common patterns

fold("coding") for coding agents, fold("chat") for conversational AI.

Monitor your savings

Call optimize.saved() regularly to track how much you're saving. Log this data for cost analysis.