Documentation

Fold SDK Documentation

Smart context management for LLMs. Reduce token costs by 50-78% without sacrificing task performance.

import { fold } from '@fold/sdk'

const ctx = fold()  // That's it!

ctx.system("You are a helpful assistant")
ctx.think("I need to search for information...")
ctx.act({ tool: "search", query: "fold sdk" }, "search")
ctx.observe("Found 3 results...", "search")

// Get optimized messages for your LLM
const messages = ctx.messages()

// Check your savings
console.log(ctx.saved())
// { tokens: 5000, percent: 45, cost: 0.05 }

Get Started

Quick Start

Get up and running in under 5 minutes

Learn more

Coding Agents

Build agents like Claude Code or Cursor

Learn more

API Reference

Complete SDK documentation

Learn more

What is Fold?

Fold is an intelligent context compression platform for LLM-powered agents. LLM agents operate in loops: reason → act → observe → repeat. Each iteration adds to the context window, causing costs to scale quadratically.

A 50-turn agent conversation can cost 2,500x more than a single prompt. Fold solves this through:

Masking — Replace old observations with placeholders (cheap, fast)
Summarization — Compress context via LLM when needed (powerful, selective)
Anchor Detection — Protect important turns from optimization
Stop Signal Detection — Prevent agents from wasting tokens on impossible tasks

Presets

fold()              // Default: 100K budget, 10 turn window
fold("chat")        // 32K budget, 20 turn window
fold("coding")      // 100K budget, 15 turn window
fold("research")    // 128K budget, 10 turn window
fold("long-running") // 200K budget, 8 turn window

// Or custom
fold({ budget: 50_000, model: "gpt-4o", window: 15 })

Framework Support

OpenAI SDK

@fold/sdk/openai

Anthropic SDK

@fold/sdk/openai

Vercel AI SDK

@fold/sdk/vercel-ai

LangChain / LangGraph

@fold/sdk/langchain