The Real Context Workflow

Think in terms of turn reduction, not knowledge modeling. Your input is a big array of turns. Your output is a smaller context window for the next model call.

Do not optimize the wrong thing. Most teams lose time trying to make the raw transcript "better." The practical problem is simpler: too many turns go into the prompt, so the model pays attention to noise and you pay for the noise too.

What The Problem Actually Looks Like

In production, a thread is rarely just user and assistant messages. It also includes tool results, CRM data, system events, previous summaries, internal guidance, and repeated restatements of the same issue.

{
  "turns": [
    "User: We still cannot log in after yesterday's Okta cutover.",
    "Agent: Pulling account metadata and auth logs.",
    "Tool crm_lookup: account=acme_corp tier=enterprise billing=current renewal=2026-07-01",
    "Tool auth_audit: 14 failed SAML assertions since 09:12 UTC; issuer mismatch detected.",
    "Internal note: Customer is not delinquent. Keep ticket in support queue.",
    "Previous ticket: promised service credit if outage exceeds 4 hours.",
    "Slack escalation: INC-4821 open; workaround is manual issuer override.",
    "User: Three teams lost admin access after yesterday's metadata change.",
    "Tool statuspage: degraded identity service in us-east-1.",
    "Agent: Need a concise handoff context before the next model call."
  ]
}

That array is still far smaller than what many teams send in reality. The important point is that it already contains overlap, stale details, and the same issue stated in different ways.


Step 1: First Reduction With POST /context

Use POST /context when you want the first compact pass. Send raw turns. Get back the normalized state that seems most important.

curl -X POST http://localhost:9300/context \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{
    "turns": [
      "User: We still cannot log in after yesterday's Okta cutover.",
      "Tool crm_lookup: account=acme_corp tier=enterprise billing=current",
      "Tool auth_audit: issuer mismatch detected after IdP migration.",
      "Slack escalation: workaround is manual issuer override."
    ],
    "maxFacts": 12
  }'
{
  "facts": [
    { "predicate": "customer_tier", "args": ["acme_corp", "enterprise"], "salience": 0.98 },
    { "predicate": "billing_status", "args": ["acme_corp", "current"], "salience": 0.80 },
    { "predicate": "current_issue", "args": ["acme_corp", "saml_issuer_mismatch"], "salience": 0.97 },
    { "predicate": "workaround", "args": ["acme_corp", "manual_issuer_override"], "salience": 0.88 }
  ],
  "factsReturned": 4,
  "contradictions": 0,
  "newFactsExtracted": 4
}
Important: you do not need to invent predicate names up front to use this flow. The product can extract and normalize structure from turns. The predicate-style output is the backend representation of the reduced window.

Step 2: Ask The Next Question With POST /context/optimize

The first pass is broad. The next pass should be question-specific. Once you know what the next model call is trying to do, use /context/optimize to narrow the window further.

curl -X POST http://localhost:9300/context/optimize \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{
    "sessionId": "support-thread-4821",
    "maxFacts": 10,
    "goals": [
      {"predicate":"next_best_action","args":["acme_corp"]},
      {"predicate":"service_credit_applicable","args":["acme_corp"]}
    ]
  }'

This is where the context window becomes operational instead of merely descriptive. The model stops seeing the whole incident and starts seeing the subset needed to answer the next action question.


Step 3: Stop Re-Sending The Same Context With POST /context/diff

This is the step most teams miss. If the conversation continues, do not send the whole optimized window again. Keep the same sessionId and ask for the delta.

curl -X POST http://localhost:9300/context/diff \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{
    "sessionId": "support-thread-4821",
    "maxFacts": 10
  }'
{
  "previousWindowId": "ctx-01",
  "currentWindowId": "ctx-02",
  "added": [
    { "predicate": "temporary_access_restored", "args": ["acme_corp", "true"], "salience": 0.92 }
  ],
  "removed": [],
  "unchanged": 9,
  "fullRefreshRecommended": false
}

That is the production benefit: later model calls pay for the change, not for the entire thread history again.


Step 4: End The Thread Cleanly

When the thread is finished, clear the diff snapshot:

curl -X POST http://localhost:9300/context/session/clear \
  -H "Content-Type: application/json" \
  -H "X-Tenant-ID: default" \
  -d '{"sessionId":"support-thread-4821"}'

Formatting For The Model

Most teams take the returned entries and flatten them into a short system or tool message. The model does not need the original transcript if the compact context already captures the operational state.

import requests
from openai import OpenAI

client = OpenAI()

ctx = requests.post(
    "http://localhost:9300/context/optimize",
    headers={"X-Tenant-ID": "default"},
    json={
        "sessionId": "support-thread-4821",
        "maxFacts": 10,
        "goals": [{"predicate": "next_best_action", "args": ["acme_corp"]}],
    },
).json()

context_lines = [
    f"- {entry['predicate']}({', '.join(entry['args'])})"
    for entry in ctx["entries"]
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Use only this reduced context:\n" + "\n".join(context_lines)},
        {"role": "user", "content": "Write the next support reply."},
    ],
)

Where The Other Surfaces Fit


If You Want The Backend Details

Predicates, rules, scopes, salience scoring, truth maintenance, and memory lifecycle still exist. They just belong in the backend explanation, not at the front of the product story.

If that is what you need next, go to How It Works on the Backend.


What's Next?

API Reference →

Every context endpoint and request shape

SDKs →

Call the same workflow from Python or TypeScript

MCP →

Use the context tool from Cursor, Claude Desktop, or any MCP client

How It Works →

Backend mechanics, memory lifecycle, predicates, and rules