OpenAI charges $15/M tokens. Your agents waste 95% of them.

97% Less Context.
Same Answers.

GPT-4o is $15/M tokens. Claude is $15/M. Every request stuffs 150K tokens of irrelevant context.

NocturnusAI's goal-driven engine delivers only the facts that matter — 500 facts in, 15 out, zero information loss. Stop subsidizing OpenAI with wasted tokens and start optimizing your context.

The context optimization engine for production AI agents.

$ curl -fsSL https://raw.githubusercontent.com/Auctalis/nocturnusai/main/install.sh | bash

Requires Docker with Compose V2 installed. Or install with an AI prompt.

97% Context Reduction Goal-Driven Optimization Backward + Forward Chaining Hexastore — 6-way indexed Truth Maintenance System Salience Memory MCP · A2A · REST
turns in, facts out — that's the whole API
# 1. Send your conversation turns
$ POST /context  { "turns": ["Acme is on enterprise plan", "They have 24/7 SLA"] }
✓ 5 facts extracted & ranked — 200 tokens
# 2. Feed those facts to GPT-4o or Claude
$ openai.chat(system=facts, user="Is Acme eligible for premium SLA?")
✓ "Yes — enterprise tier qualifies for 24/7 SLA support."
sourced from: customer_tier(acme, enterprise) · sla_tier(acme, 24_7)
Token cost for that answer: $2.25 (150K tokens) $0.003 (200 tokens)
Context Management Engine

Pay for Signal.
Not Noise.

OpenAI charges $15 per million tokens. Anthropic charges $15. Google charges $10. And your agents waste 95% of every request on irrelevant context. NocturnusAI uses goal-driven backward chaining to deliver only what your agent actually needs — slashing your LLM bill by 97%.

Without NocturnusAI Every RAG pipeline
$2.25 per request to OpenAI alone
150K tokens at GPT-4o's $15/M rate — and that's before Claude or Gemini costs
$54,000/month at scale
1,000 requests/hr × 24hr = token spend that grows linearly with usage
Worse accuracy, higher cost
Contradictory and stale facts in context degrade LLM reasoning quality
Tokens billed per request
95% wasted spend
With NocturnusAI Goal-driven optimization
$0.01 per request
820 tokens — only goal-relevant facts, ranked by salience, with full provenance
Goal-driven backward chaining
Tell us the question — we trace exactly which facts matter via SLD resolution
Cheaper and more accurate
Contradictions caught before they reach the LLM — fewer tokens, better answers
Tokens billed per request
3%
97% cost reduction — every remaining token earns its keep

The Optimization Pipeline

Every call to POST /context runs this pipeline in under 50ms

01
Send Turns
Array of strings
02
Facts Ranked
Extract · dedupe · score
03
Feed to LLM
GPT-4o · Claude · any
POST /context request
{
  "turns": [
    "Acme Corp is on the enterprise plan.",
    "They have a $2M contract.",
    "24/7 SLA support included."
  ]
}
Ranked Facts response
{
  "facts": [
    { "predicate": "customer_tier",
      "args": ["acme_corp", "enterprise"],
      "salience": 0.95 },
    { "predicate": "contract_value",
      "args": ["acme_corp", "2000000"],
      "salience": 0.92 },
    { "predicate": "sla_tier",
      "args": ["acme_corp", "24_7"],
      "salience": 0.90 }
  ],
  "factsReturned": 3,
  "totalFactsInKB": 127,
  "contradictions": 0
}
$0.01
Per Optimized Request
vs $2.25 unoptimized
97%
Token Cost Reduction
500 → 15 facts
< 50ms
Pipeline Latency
Full optimization pass
Diff
Incremental Updates
Only bill for changes

Turns in, facts out, GPT-4o in. Drop into any OpenAI workflow in minutes.

# pip install nocturnusai openai
import requests, openai
# Turns in, facts out
resp = requests.post("http://localhost:9300/context", json={
"turns": ["Acme is enterprise tier", "They have 24/7 SLA"]
})
facts = "\n".join(f"- {f['predicate']}({', '.join(f['args'])})" for f in resp.json()["facts"])
# Feed to GPT-4o — 200 tokens instead of 150K
answer = openai.OpenAI().chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": f"Facts:\n{facts}"},
{"role": "user", "content": "Is Acme eligible for SLA?"}]
)
# Correct. Sourced. $0.003 instead of $2.25 ✓

Your Agent + OpenAI, Before and After

Same question to GPT-4o. One costs $2.25. The other costs $0.003.

Without NocturnusAI 150K tokens → $2.25
# Stuff everything into the system prompt
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "system",
        "content": entire_knowledge_base
        # 500 facts, 47 rules = 150K tokens
    }, {
        "role": "user",
        "content": "What plan is Acme on?"
    }]
)

# "I believe they're on the premium plan..."
# Wrong. $2.25 wasted. 95% of context irrelevant.
With NocturnusAI 200 tokens → $0.003
# Turns in, facts out
facts = requests.post("/context", json={
    "turns": ["Acme is on enterprise",
              "They have 24/7 SLA"]
}).json()["facts"]

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "system",
        "content": format_facts(facts)
        # 3 facts, 200 tokens
    }, {
        "role": "user",
        "content": "What plan is Acme on?"
    }]
)

# "Acme Corp is on the enterprise plan."
# Correct. Sourced. $0.003. ✓
$2.25
per GPT-4o call, unoptimized
$0.003
per GPT-4o call, with NocturnusAI
Stop overpaying for context

Pay for Signal. Not Noise.

Every token costs money. NocturnusAI's context engine ensures you only pay for facts that matter — then adds verified reasoning, memory lifecycle, and consistency on top.

Context optimizer

97% Context Reduction

POST /context with your conversation turns. Get back ranked facts. That's the whole API. NocturnusAI extracts, deduplicates, and ranks automatically — 97% fewer tokens billed to OpenAI or Claude.

Cost reduction

From $2.25 to $0.003 Per Request

Send your conversation turns to NocturnusAI, feed the ranked facts to GPT-4o or Claude. Two HTTP calls instead of one bloated prompt. Works with any LLM provider.

Quality + savings

Cheaper and More Accurate

Less context means better answers. NocturnusAI catches contradictions before they reach GPT-4o, deduplicates across sources, and ranks by salience. Your agent reasons over signal, not noise.

How your agents connect
Natural language

Plain English In, Verified Facts Out

POST /extract with any text. NocturnusAI calls your LLM to pull out structured facts and stores them automatically. No schema design, no parsing code, no mapping logic.

Q&A

Ask Questions, Get Grounded Answers

POST /synthesize with a natural language question. NocturnusAI queries its fact store, runs inference, and returns a sourced answer — not a hallucinated guess from token probabilities.

MCP native

9 MCP Tools, Zero Integration Work

Connect any MCP-compatible agent, IDE, or framework with a two-line config. tell, ask, teach, forget, recall, context — a complete reasoning toolkit with no integration code.

What happens underneath
Memory lifecycle

Salience-Ranked Memory

Composite scoring keeps the most relevant facts surfaced for your agent's context window. Episodic patterns consolidate into semantic summaries. Low-relevance facts decay automatically.

Consistency

Truth Maintenance System

Retract a fact and every conclusion that depended on it disappears automatically. No stale inferences, no manual cleanup — the knowledge base stays consistent by design.

Time-aware

Temporal Atoms

Every fact carries validFrom, validUntil, and TTL fields. Facts auto-expire. Query what was true at any point in time. Agents reason over history, not just the present snapshot.

Transactional

ACID Transactions

Multi-agent systems write concurrently. Transactions ensure atomic commits with contradiction detection — agents can explore hypotheticals without polluting shared state.

Ops-ready

Production Durability

WAL + snapshots for crash recovery. Leader/follower replication for read scaling. Prometheus metrics. Kubernetes-ready health probes. Self-hosted, your data, your infrastructure.

Interoperable

Universal Protocol Support

MCP, REST, Python SDK, TypeScript SDK, A2A agent discovery. Whatever your stack, NocturnusAI plugs in. New protocols don't require rewriting your knowledge layer.

The Architecture

Not a Plugin. A Cost Engine.

Other tools sit on top of your LLM and add tokens. Nocturnus sits beneath your agents and removes them — delivering only what matters, cutting context costs by 97%, while making every answer provable and traceable.

Your Agent Layer
LangChainCrewAIAutoGenClaudeCustom Agent
Protocol Layer
MCP (9 tools)HTTP REST APIPython SDKTypeScript SDKA2A Protocol
NocturnusAI — The Logic Engine
Hexastore
6-way indexed KB
Dual Inference
Backward + Rete
Context Engine
$2.25 → $0.01/req
Truth Maintenance
Cascade retracts
Temporal Atoms
Time-aware facts
Salience Memory
Recency · freq · priority
ACID Transactions
Atomic reasoning
Multi-Tenancy
DB + tenant headers
agent.py — any framework connects the same way
from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from nocturnusai.langchain import get_nocturnusai_tools

# Point your agent at the logic engine
tools = get_nocturnusai_tools("http://localhost:9300")
# tells, asks, teaches, forgets, recalls, context
# — all backed by the Hexastore + inference engine

llm = ChatAnthropic(model="claude-sonnet-4-20250514")
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

result = executor.invoke({
  "input": "Is Acme Corp eligible for premium SLA?"
})
# Agent reasons over verified facts, not LLM memory.
# Answer is provable. Traceable. Consistent.
What the infrastructure provides provable · consistent · durable
// 1. Hexastore returns the fact in <100ms
customer_tier(acme, enterprise) → TRUE
// 2. Inference engine derives eligibility via rule
eligible_for_sla(acme) → DERIVED
via: eligible_for_sla(?c) :- customer_tier(?c, enterprise)
// 3. Full proof trace returned to agent
{ "result": true,
"proof": [
"customer_tier(acme, enterprise)",
"rule: eligible_for_sla(?c) :- ..."
] }
// 820 tokens billed. Correct. Provable.
// 225x cheaper than context stuffing. ✓

9 MCP tools, all backed by the logic engine

tell
teach
ask
forget
recall
context
compress
cleanup
predicates
⚡ Zero to production

Up and Running in 60 Seconds

No signup. No cloud dependency. No schemas to design. Production-grade infrastructure, self-hosted, on your terms.

01
Requires Docker

Deploy the Logic Engine

One curl command. Requires Docker with Compose V2 already installed. The installer pulls the image, starts the server via Docker Compose, waits for healthy, and installs the native CLI binary. Nocturnus is live on port 9300 in under 30 seconds.

02
Hexastore + TMS

Load Your World

Assert facts about your domain: customers, products, rules, state, relationships. Everything is structured, typed, and time-aware. Rules you define teach the engine what to derive. The KB grows as your world grows.

03
MCP · SDK · REST

Connect Your Agents

Point any MCP-compatible framework, the Python SDK, TypeScript SDK, or direct HTTP at the running server. Context optimization kicks in automatically — your agents get provable answers at 97% lower token cost.

bash
$ curl -fsSL https://raw.githubusercontent.com/Auctalis/nocturnusai/main/install.sh | bash
✓ Nocturnus live on :9300 — WAL ready · Hexastore ready · Inference ready
✓ CLI installed → nocturnusai
$ curl localhost:9300/health
{ "status": "ok", "ready": true }

Stop paying for tokens your agent doesn't need

$0.01
Per Optimized Request
Down from $2.25 unoptimized
97%
Token Cost Reduction
Pay for 3%, get 100% of the answer
< 50ms
Optimization Pipeline
Goal-driven context in milliseconds
ACID
Transactional Truth
Commit or rollback atomically
MCP
Protocol Native
9 tools, any agent framework