The Real Context Workflow

Think in terms of turn reduction, not knowledge modeling. Your input is a big array of turns. Your output is a smaller context window for the next model call.

Do not optimize the wrong thing. Most teams lose time trying to make the raw transcript "better." The practical problem is simpler: too many turns go into the prompt, so the model pays attention to noise and you pay for the noise too.

What The Problem Actually Looks Like

In production, a thread is rarely just user and assistant messages. It also includes tool results, CRM data, system events, previous summaries, internal guidance, and repeated restatements of the same issue.

{
  "turns": [
    "User: We still cannot log in after yesterday's Okta cutover.",
    "Agent: Pulling account metadata and auth logs.",
    "Tool crm_lookup: account=acme_corp tier=enterprise billing=current renewal=2026-07-01",
    "Tool auth_audit: 14 failed SAML assertions since 09:12 UTC; issuer mismatch detected.",
    "Internal note: Customer is not delinquent. Keep ticket in support queue.",
    "Previous ticket: promised service credit if outage exceeds 4 hours.",
    "Slack escalation: INC-4821 open; workaround is manual issuer override.",
    "User: Three teams lost admin access after yesterday's metadata change.",
    "Tool statuspage: degraded identity service in us-east-1.",
    "Agent: Need a concise handoff context before the next model call."
  ]
}

That array is still far smaller than what many teams send in reality. The important point is that it already contains overlap, stale details, and the same issue stated in different ways.

Step 1: First Reduction With `POST /context`

Use POST /context when you want the first compact pass. Send raw turns. Get back the normalized state that seems most important.

curl -X POST http://localhost:9300/context \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{
    "turns": [
      "User: We still cannot log in after yesterday's Okta cutover.",
      "Tool crm_lookup: account=acme_corp tier=enterprise billing=current",
      "Tool auth_audit: issuer mismatch detected after IdP migration.",
      "Slack escalation: workaround is manual issuer override."
    ],
    "maxFacts": 12
  }'

{
  "facts": [
    { "predicate": "customer_tier", "args": ["acme_corp", "enterprise"], "salience": 0.98 },
    { "predicate": "billing_status", "args": ["acme_corp", "current"], "salience": 0.80 },
    { "predicate": "current_issue", "args": ["acme_corp", "saml_issuer_mismatch"], "salience": 0.97 },
    { "predicate": "workaround", "args": ["acme_corp", "manual_issuer_override"], "salience": 0.88 }
  ],
  "factsReturned": 4,
  "contradictions": 0,
  "newFactsExtracted": 4
}

Important: you do not need to invent predicate names up front to use this flow. The product can extract and normalize structure from turns. The predicate-style output is the backend representation of the reduced window.

Step 2: Ask The Next Question With `POST /context/optimize`

The first pass is broad. The next pass should be question-specific. Once you know what the next model call is trying to do, use /context/optimize to narrow the window further.

curl -X POST http://localhost:9300/context/optimize \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{
    "sessionId": "support-thread-4821",
    "maxFacts": 10,
    "goals": [
      {"predicate":"next_best_action","args":["acme_corp"]},
      {"predicate":"service_credit_applicable","args":["acme_corp"]}
    ]
  }'

This is where the context window becomes operational instead of merely descriptive. The model stops seeing the whole incident and starts seeing the subset needed to answer the next action question.

Step 3: Stop Re-Sending The Same Context With `POST /context/diff`

This is the step most teams miss. If the conversation continues, do not send the whole optimized window again. Keep the same sessionId and ask for the delta.

curl -X POST http://localhost:9300/context/diff \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{
    "sessionId": "support-thread-4821",
    "maxFacts": 10
  }'

{
  "previousWindowId": "ctx-01",
  "currentWindowId": "ctx-02",
  "added": [
    { "predicate": "temporary_access_restored", "args": ["acme_corp", "true"], "salience": 0.92 }
  ],
  "removed": [],
  "unchanged": 9,
  "fullRefreshRecommended": false
}

That is the production benefit: later model calls pay for the change, not for the entire thread history again.

Step 4: End The Thread Cleanly

When the thread is finished, clear the diff snapshot:

curl -X POST http://localhost:9300/context/session/clear \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{"sessionId":"support-thread-4821"}'

Formatting For The Model

Most teams take the returned entries and flatten them into a short system or tool message. The model does not need the original transcript if the compact context already captures the operational state.

import requests
from openai import OpenAI

client = OpenAI()

ctx = requests.post(
    "http://localhost:9300/context/optimize",
    headers={"X-Tenant-ID": "default"},
    json={
        "sessionId": "support-thread-4821",
        "maxFacts": 10,
        "goals": [{"predicate": "next_best_action", "args": ["acme_corp"]}],
    },
).json()

context_lines = [
    f"- {entry['predicate']}({', '.join(entry['args'])})"
    for entry in ctx["entries"]
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Use only this reduced context:\n" + "\n".join(context_lines)},
        {"role": "user", "content": "Write the next support reply."},
    ],
)

Where The Other Surfaces Fit

API: best when your app already owns the turn array and prompt assembly.
Python SDK: best when you want optimize_context(), diff_context(), and session cleanup in app code.
TypeScript SDK: best when you want contextWindow(), optimizeContext(), diffContext(), and clearContextSession() directly in app code.
MCP: best when your agent runtime already uses tool calling. Use MCP context for salience retrieval and pair it with the HTTP context endpoints when you need goal-specific windows.

If You Want The Backend Details

Predicates, rules, scopes, salience scoring, truth maintenance, and memory lifecycle still exist. They just belong in the backend explanation, not at the front of the product story.

If that is what you need next, go to How It Works on the Backend.

The Real Context Workflow

What The Problem Actually Looks Like

Step 1: First Reduction With `POST /context`

Step 2: Ask The Next Question With `POST /context/optimize`

Step 3: Stop Re-Sending The Same Context With `POST /context/diff`

Step 4: End The Thread Cleanly

Formatting For The Model

Where The Other Surfaces Fit

If You Want The Backend Details

What's Next?

API Reference →

SDKs →

MCP →

How It Works →

The Real Context Workflow

What The Problem Actually Looks Like

Step 1: First Reduction With POST /context

Step 2: Ask The Next Question With POST /context/optimize

Step 3: Stop Re-Sending The Same Context With POST /context/diff

Step 4: End The Thread Cleanly

Formatting For The Model

Where The Other Surfaces Fit

If You Want The Backend Details

What's Next?

API Reference →

SDKs →

MCP →

How It Works →

Step 1: First Reduction With `POST /context`

Step 2: Ask The Next Question With `POST /context/optimize`

Step 3: Stop Re-Sending The Same Context With `POST /context/diff`