97% Less Context.
Same Answers.
GPT-4o is $15/M tokens. Claude is $15/M. Every request stuffs 150K tokens of irrelevant context.
NocturnusAI's goal-driven engine delivers only the facts that matter — 500 facts in, 15 out, zero information loss. Stop subsidizing OpenAI with wasted tokens and start optimizing your context.
The context optimization engine for production AI agents.
Requires Docker with Compose V2 installed. Or install with an AI prompt.
Pay for Signal.
Not Noise.
OpenAI charges $15 per million tokens. Anthropic charges $15. Google charges $10. And your agents waste 95% of every request on irrelevant context. NocturnusAI uses goal-driven backward chaining to deliver only what your agent actually needs — slashing your LLM bill by 97%.
The Optimization Pipeline
Every call to POST /context runs this pipeline in under 50ms
{
"turns": [
"Acme Corp is on the enterprise plan.",
"They have a $2M contract.",
"24/7 SLA support included."
]
} {
"facts": [
{ "predicate": "customer_tier",
"args": ["acme_corp", "enterprise"],
"salience": 0.95 },
{ "predicate": "contract_value",
"args": ["acme_corp", "2000000"],
"salience": 0.92 },
{ "predicate": "sla_tier",
"args": ["acme_corp", "24_7"],
"salience": 0.90 }
],
"factsReturned": 3,
"totalFactsInKB": 127,
"contradictions": 0
} Turns in, facts out, GPT-4o in. Drop into any OpenAI workflow in minutes.
Your Agent + OpenAI, Before and After
Same question to GPT-4o. One costs $2.25. The other costs $0.003.
# Stuff everything into the system prompt
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": entire_knowledge_base
# 500 facts, 47 rules = 150K tokens
}, {
"role": "user",
"content": "What plan is Acme on?"
}]
)
# "I believe they're on the premium plan..."
# Wrong. $2.25 wasted. 95% of context irrelevant. # Turns in, facts out
facts = requests.post("/context", json={
"turns": ["Acme is on enterprise",
"They have 24/7 SLA"]
}).json()["facts"]
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": format_facts(facts)
# 3 facts, 200 tokens
}, {
"role": "user",
"content": "What plan is Acme on?"
}]
)
# "Acme Corp is on the enterprise plan."
# Correct. Sourced. $0.003. ✓ Pay for Signal. Not Noise.
Every token costs money. NocturnusAI's context engine ensures you only pay for facts that matter — then adds verified reasoning, memory lifecycle, and consistency on top.
97% Context Reduction
POST /context with your conversation turns. Get back ranked facts. That's the whole API. NocturnusAI extracts, deduplicates, and ranks automatically — 97% fewer tokens billed to OpenAI or Claude.
From $2.25 to $0.003 Per Request
Send your conversation turns to NocturnusAI, feed the ranked facts to GPT-4o or Claude. Two HTTP calls instead of one bloated prompt. Works with any LLM provider.
Cheaper and More Accurate
Less context means better answers. NocturnusAI catches contradictions before they reach GPT-4o, deduplicates across sources, and ranks by salience. Your agent reasons over signal, not noise.
Plain English In, Verified Facts Out
POST /extract with any text. NocturnusAI calls your LLM to pull out structured facts and stores them automatically. No schema design, no parsing code, no mapping logic.
Ask Questions, Get Grounded Answers
POST /synthesize with a natural language question. NocturnusAI queries its fact store, runs inference, and returns a sourced answer — not a hallucinated guess from token probabilities.
9 MCP Tools, Zero Integration Work
Connect any MCP-compatible agent, IDE, or framework with a two-line config. tell, ask, teach, forget, recall, context — a complete reasoning toolkit with no integration code.
Salience-Ranked Memory
Composite scoring keeps the most relevant facts surfaced for your agent's context window. Episodic patterns consolidate into semantic summaries. Low-relevance facts decay automatically.
Truth Maintenance System
Retract a fact and every conclusion that depended on it disappears automatically. No stale inferences, no manual cleanup — the knowledge base stays consistent by design.
Temporal Atoms
Every fact carries validFrom, validUntil, and TTL fields. Facts auto-expire. Query what was true at any point in time. Agents reason over history, not just the present snapshot.
ACID Transactions
Multi-agent systems write concurrently. Transactions ensure atomic commits with contradiction detection — agents can explore hypotheticals without polluting shared state.
Production Durability
WAL + snapshots for crash recovery. Leader/follower replication for read scaling. Prometheus metrics. Kubernetes-ready health probes. Self-hosted, your data, your infrastructure.
Universal Protocol Support
MCP, REST, Python SDK, TypeScript SDK, A2A agent discovery. Whatever your stack, NocturnusAI plugs in. New protocols don't require rewriting your knowledge layer.
Not a Plugin. A Cost Engine.
Other tools sit on top of your LLM and add tokens. Nocturnus sits beneath your agents and removes them — delivering only what matters, cutting context costs by 97%, while making every answer provable and traceable.
from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from nocturnusai.langchain import get_nocturnusai_tools
# Point your agent at the logic engine
tools = get_nocturnusai_tools("http://localhost:9300")
# tells, asks, teaches, forgets, recalls, context
# — all backed by the Hexastore + inference engine
llm = ChatAnthropic(model="claude-sonnet-4-20250514")
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
result = executor.invoke({
"input": "Is Acme Corp eligible for premium SLA?"
})
# Agent reasons over verified facts, not LLM memory.
# Answer is provable. Traceable. Consistent. 9 MCP tools, all backed by the logic engine
Up and Running in 60 Seconds
No signup. No cloud dependency. No schemas to design. Production-grade infrastructure, self-hosted, on your terms.
Deploy the Logic Engine
One curl command. Requires Docker with Compose V2 already installed. The installer pulls the image, starts the server via Docker Compose, waits for healthy, and installs the native CLI binary. Nocturnus is live on port 9300 in under 30 seconds.
Load Your World
Assert facts about your domain: customers, products, rules, state, relationships. Everything is structured, typed, and time-aware. Rules you define teach the engine what to derive. The KB grows as your world grows.
Connect Your Agents
Point any MCP-compatible framework, the Python SDK, TypeScript SDK, or direct HTTP at the running server. Context optimization kicks in automatically — your agents get provable answers at 97% lower token cost.
Stop paying for tokens your agent doesn't need