API Reference

@fold/sdk API Reference

Complete reference for all functions, classes, and types in the Fold SDK.

Easy API

The simplest way to use Fold. Handles all optimization automatically.

fold(options?)

Create a context optimizer with sensible defaults

import { fold } from '@fold/sdk'

// Default: 100K budget, 10 turn window
const ctx = fold()

// Use a preset
const ctx = fold("coding")

// Custom options
const ctx = fold({
  budget: 50_000,
  model: "gpt-4o",
  window: 15
})

Presets

Preset	Budget	Window
(default)	100K	10 turns
"chat"	32K	20 turns
"coding"	100K	15 turns
"research"	128K	10 turns
"long-running"	200K	8 turns

Returns

FoldContext - A context optimizer instance

restore(data)

Restore a context from saved state

import { fold, restore } from '@fold/sdk'

// Save context
const ctx = fold()
ctx.observe("Hello!", "user")
const saved = ctx.save()
localStorage.setItem('context', JSON.stringify(saved))

// Later, restore it
const data = JSON.parse(localStorage.getItem('context'))
const restored = restore(data)

Parameters

data - Serialized context from ctx.save()

FoldContext Methods

Methods available on the context object returned by fold().

Content Methods

Add content to the context

system(prompt: string)

Set the system prompt. Called once at the start.

ctx.system("You are a helpful coding assistant.")

think(content: string)

Add reasoning or thoughts. Protected from aggressive optimization.

ctx.think("I should search for the configuration file first.")

act(action: object, tool?: string)

Add an action or tool call. Include the tool name for better categorization.

ctx.act({ path: "/app/config.json" }, "read_file")
ctx.act({ query: "SELECT * FROM users" }, "database")

observe(result: string, source?: string)

Add an observation or result. These are the most aggressively optimized.

ctx.observe("File contents: { port: 3000 }", "read_file")
ctx.observe("Query returned 42 rows", "database")

Output Methods

Get optimized context for your LLM

messages()

Get optimized messages array for LLM APIs. Compatible with OpenAI and Anthropic.

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: ctx.messages(),  // Optimized!
})

compile()

Get full compilation result with messages and optimization stats.

const { messages, optimization, stopSignal } = ctx.compile()
console.log(optimization.savingsPercent) // 45

Stop Signal Methods

Detect when to stop the agent loop

stop()

Check if the agent should stop. Returns true for loops, failures, or goal completion.

while (true) {
  // ... agent logic
  if (ctx.stop()) break
}

reason()

Get the stop reason when stop() returns true.

if (ctx.stop()) {
  const signal = ctx.reason()
  console.log(signal.reason)      // "repeated_failure"
  console.log(signal.confidence)  // 0.85
}

Stop Reasons

Reason	Description
repeated_failure	3+ consecutive errors
no_progress	Same action repeated 5x
goal_achieved	Success indicators detected
turn_limit	50+ turns executed

Statistics Methods

Track optimization and savings

saved()

Quick summary of token savings and cost reduction.

console.log(ctx.saved())
// { tokens: 5000, percent: 45, cost: 0.05 }

stats()

Detailed statistics about context and optimization.

console.log(ctx.stats())
// {
//   turnCount: 25,
//   totalTokens: 12000,
//   maskedTokens: 7000,
//   summarizedTokens: 0,
//   compressionRatio: 0.58
// }

Persistence Methods

Save and restore context across sessions

save()

Export context for persistence. Returns a serializable object.

const data = ctx.save()
localStorage.setItem('context', JSON.stringify(data))

Advanced API (ContextSession)

For full control over optimization behavior, use ContextSession directly.

new ContextSession(config)

Create a context session with full configuration

import { ContextSession } from '@fold/sdk'

const session = new ContextSession({
  budget: 100_000,
  model: 'claude-sonnet-4-5-20250929',
  masking: {
    enabled: true,
    strategy: 'hybrid',
    windowSize: 15,
    preserveAnchors: true,
  },
  summarization: {
    enabled: true,
    trigger: 'budget_pressure',
    threshold: 0.8,
  },
})

Configuration Options

Option	Type	Description
budget	number	Token budget limit
model	string	Model name for tokenizer selection
masking.enabled	boolean	Enable observation masking
masking.strategy	string	"rolling_window" \| "token_budget" \| "hybrid"
masking.windowSize	number	Number of turns to keep unmasked
summarization.enabled	boolean	Enable automatic summarization
summarization.trigger	string	"budget_pressure" \| "turn_count"

Masking Strategies

rolling_window

Keep the last N turns full, mask earlier observations with placeholders. Best for general use.

token_budget

Mask oldest observations until under budget. Best for strict token limits.

relevance_scored

Score relevance of each turn and mask lowest first. Best for query-heavy tasks.

hybrid

Combine window + budget + relevance strategies. Maximum efficiency (recommended).

Summarizers

Built-in summarizer factories for different LLM providers.

import {
  createOpenAISummarizer,      // OpenAI API
  createAnthropicSummarizer,   // Claude API
  createFetchSummarizer,       // Any OpenAI-compatible API
  createSimpleSummarizer,      // Offline/heuristic-based
} from '@fold/sdk/summarizers'

// OpenAI
session.setSummarizationCallback(
  createOpenAISummarizer(new OpenAI(), { model: 'gpt-4o-mini' })
)

// Groq (fast inference)
session.setSummarizationCallback(
  createFetchSummarizer({
    url: 'https://api.groq.com/openai/v1/chat/completions',
    apiKey: process.env.GROQ_API_KEY,
    model: 'llama-3.1-8b-instant',
  })
)

// Local Ollama
session.setSummarizationCallback(
  createFetchSummarizer({
    url: 'http://localhost:11434/v1/chat/completions',
    apiKey: 'ollama',
    model: 'llama3.2',
  })
)