OpenAI charges $15/M tokens. Your agents waste 95% of them.

97% Less Context.
Same Answers.

GPT-4o is $15/M tokens. Claude is $15/M. Every request stuffs 150K tokens of irrelevant context.

NocturnusAI's goal-driven engine delivers only the facts that matter — 500 facts in, 15 out, zero information loss. Stop subsidizing OpenAI with wasted tokens and start optimizing your context.

The context optimization engine for production AI agents.

Works with LangChain CrewAI AutoGen OpenAI Agents Anthropic Claude Cursor Any MCP client

$ curl -fsSL https://raw.githubusercontent.com/Auctalis/nocturnusai/main/install.sh | bash

Requires Docker with Compose V2 installed. Or install with an AI prompt.

Start Saving Tokens → See How It Works

97% Context Reduction Goal-Driven Optimization Backward + Forward Chaining Hexastore — 6-way indexed Truth Maintenance System Salience Memory MCP · A2A · REST

turns in, facts out — that's the whole API

# 1. Send your conversation turns

$ POST /context { "turns": ["Acme is on enterprise plan", "They have 24/7 SLA"] }

✓ 5 facts extracted & ranked — 200 tokens

# 2. Feed those facts to GPT-4o or Claude

$ openai.chat(system=facts, user="Is Acme eligible for premium SLA?")

✓ "Yes — enterprise tier qualifies for 24/7 SLA support."

sourced from: customer_tier(acme, enterprise) · sla_tier(acme, 24_7)

Token cost for that answer: $2.25 (150K tokens) → $0.003 (200 tokens)

Context Management Engine

Pay for Signal.
Not Noise.

OpenAI charges $15 per million tokens. Anthropic charges $15. Google charges $10. And your agents waste 95% of every request on irrelevant context. NocturnusAI uses goal-driven backward chaining to deliver only what your agent actually needs — slashing your LLM bill by 97%.

Without NocturnusAI Every RAG pipeline

$2.25 per request to OpenAI alone

150K tokens at GPT-4o's $15/M rate — and that's before Claude or Gemini costs

$54,000/month at scale

1,000 requests/hr × 24hr = token spend that grows linearly with usage

Worse accuracy, higher cost

Contradictory and stale facts in context degrade LLM reasoning quality

Tokens billed per request

95% wasted spend

With NocturnusAI Goal-driven optimization

$0.01 per request

820 tokens — only goal-relevant facts, ranked by salience, with full provenance

Goal-driven backward chaining

Tell us the question — we trace exactly which facts matter via SLD resolution

Cheaper and more accurate

Contradictions caught before they reach the LLM — fewer tokens, better answers

Tokens billed per request

97% cost reduction — every remaining token earns its keep

The Optimization Pipeline

Every call to POST /context runs this pipeline in under 50ms

Send Turns

Array of strings

Facts Ranked

Extract · dedupe · score

Feed to LLM

GPT-4o · Claude · any

POST /context request

{
  "turns": [
    "Acme Corp is on the enterprise plan.",
    "They have a $2M contract.",
    "24/7 SLA support included."
  ]
}

Ranked Facts response

{
  "facts": [
    { "predicate": "customer_tier",
      "args": ["acme_corp", "enterprise"],
      "salience": 0.95 },
    { "predicate": "contract_value",
      "args": ["acme_corp", "2000000"],
      "salience": 0.92 },
    { "predicate": "sla_tier",
      "args": ["acme_corp", "24_7"],
      "salience": 0.90 }
  ],
  "factsReturned": 3,
  "totalFactsInKB": 127,
  "contradictions": 0
}

$0.01

Per Optimized Request

vs $2.25 unoptimized

97%

Token Cost Reduction

500 → 15 facts

< 50ms

Pipeline Latency

Full optimization pass

Diff

Incremental Updates

Only bill for changes

Turns in, facts out, GPT-4o in. Drop into any OpenAI workflow in minutes.

# pip install nocturnusai openai

import requests, openai

# Turns in, facts out

resp = requests.post("http://localhost:9300/context", json={

"turns": ["Acme is enterprise tier", "They have 24/7 SLA"]

})

facts = "\n".join(f"- {f['predicate']}({', '.join(f['args'])})" for f in resp.json()["facts"])

# Feed to GPT-4o — 200 tokens instead of 150K

answer = openai.OpenAI().chat.completions.create(

model="gpt-4o",

messages=[{"role": "system", "content": f"Facts:\n{facts}"},

{"role": "user", "content": "Is Acme eligible for SLA?"}]

)

# Correct. Sourced. $0.003 instead of $2.25 ✓

Your Agent + OpenAI, Before and After

Same question to GPT-4o. One costs $2.25. The other costs $0.003.

Without NocturnusAI 150K tokens → $2.25

# Stuff everything into the system prompt
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "system",
        "content": entire_knowledge_base
        # 500 facts, 47 rules = 150K tokens
    }, {
        "role": "user",
        "content": "What plan is Acme on?"
    }]
)

# "I believe they're on the premium plan..."
# Wrong. $2.25 wasted. 95% of context irrelevant.

With NocturnusAI 200 tokens → $0.003

# Turns in, facts out
facts = requests.post("/context", json={
    "turns": ["Acme is on enterprise",
              "They have 24/7 SLA"]
}).json()["facts"]

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "system",
        "content": format_facts(facts)
        # 3 facts, 200 tokens
    }, {
        "role": "user",
        "content": "What plan is Acme on?"
    }]
)

# "Acme Corp is on the enterprise plan."
# Correct. Sourced. $0.003. ✓

$2.25

per GPT-4o call, unoptimized

→

$0.003

per GPT-4o call, with NocturnusAI

Stop overpaying for context

Pay for Signal. Not Noise.

Every token costs money. NocturnusAI's context engine ensures you only pay for facts that matter — then adds verified reasoning, memory lifecycle, and consistency on top.

Context optimizer

97% Context Reduction

POST /context with your conversation turns. Get back ranked facts. That's the whole API. NocturnusAI extracts, deduplicates, and ranks automatically — 97% fewer tokens billed to OpenAI or Claude.

Cost reduction

From $2.25 to $0.003 Per Request

Send your conversation turns to NocturnusAI, feed the ranked facts to GPT-4o or Claude. Two HTTP calls instead of one bloated prompt. Works with any LLM provider.

Quality + savings

Cheaper and More Accurate

Less context means better answers. NocturnusAI catches contradictions before they reach GPT-4o, deduplicates across sources, and ranks by salience. Your agent reasons over signal, not noise.

How your agents connect

Natural language

Plain English In, Verified Facts Out

POST /extract with any text. NocturnusAI calls your LLM to pull out structured facts and stores them automatically. No schema design, no parsing code, no mapping logic.

Q&A

Ask Questions, Get Grounded Answers

POST /synthesize with a natural language question. NocturnusAI queries its fact store, runs inference, and returns a sourced answer — not a hallucinated guess from token probabilities.

MCP native

9 MCP Tools, Zero Integration Work

Connect any MCP-compatible agent, IDE, or framework with a two-line config. tell, ask, teach, forget, recall, context — a complete reasoning toolkit with no integration code.

What happens underneath

Memory lifecycle

Salience-Ranked Memory

Composite scoring keeps the most relevant facts surfaced for your agent's context window. Episodic patterns consolidate into semantic summaries. Low-relevance facts decay automatically.

Consistency

Truth Maintenance System

Retract a fact and every conclusion that depended on it disappears automatically. No stale inferences, no manual cleanup — the knowledge base stays consistent by design.

Time-aware

Temporal Atoms

Every fact carries validFrom, validUntil, and TTL fields. Facts auto-expire. Query what was true at any point in time. Agents reason over history, not just the present snapshot.

Transactional

ACID Transactions

Multi-agent systems write concurrently. Transactions ensure atomic commits with contradiction detection — agents can explore hypotheticals without polluting shared state.

Ops-ready

Production Durability

WAL + snapshots for crash recovery. Leader/follower replication for read scaling. Prometheus metrics. Kubernetes-ready health probes. Self-hosted, your data, your infrastructure.

Interoperable

Universal Protocol Support

MCP, REST, Python SDK, TypeScript SDK, A2A agent discovery. Whatever your stack, NocturnusAI plugs in. New protocols don't require rewriting your knowledge layer.

Deep dive into every feature

The Architecture

Not a Plugin. A Cost Engine.

Other tools sit on top of your LLM and add tokens. Nocturnus sits beneath your agents and removes them — delivering only what matters, cutting context costs by 97%, while making every answer provable and traceable.

Your Agent Layer

LangChainCrewAIAutoGenClaudeCustom Agent

Protocol Layer

MCP (9 tools)HTTP REST APIPython SDKTypeScript SDKA2A Protocol

NocturnusAI — The Logic Engine

Hexastore

6-way indexed KB

Dual Inference

Backward + Rete

Context Engine

$2.25 → $0.01/req

Truth Maintenance

Cascade retracts

Temporal Atoms

Time-aware facts

Salience Memory

Recency · freq · priority

ACID Transactions

Atomic reasoning

Multi-Tenancy

DB + tenant headers

agent.py — any framework connects the same way

from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from nocturnusai.langchain import get_nocturnusai_tools

# Point your agent at the logic engine
tools = get_nocturnusai_tools("http://localhost:9300")
# tells, asks, teaches, forgets, recalls, context
# — all backed by the Hexastore + inference engine

llm = ChatAnthropic(model="claude-sonnet-4-20250514")
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

result = executor.invoke({
  "input": "Is Acme Corp eligible for premium SLA?"
})
# Agent reasons over verified facts, not LLM memory.
# Answer is provable. Traceable. Consistent.

What the infrastructure provides provable · consistent · durable

// 1. Hexastore returns the fact in <100ms

customer_tier(acme, enterprise) → TRUE

// 2. Inference engine derives eligibility via rule

eligible_for_sla(acme) → DERIVED

via: eligible_for_sla(?c) :- customer_tier(?c, enterprise)

// 3. Full proof trace returned to agent

{ "result": true,

"proof": [

"customer_tier(acme, enterprise)",

"rule: eligible_for_sla(?c) :- ..."

] }

// 820 tokens billed. Correct. Provable.

// 225x cheaper than context stuffing. ✓

9 MCP tools, all backed by the logic engine

tell

teach

ask

forget

recall

context

compress

cleanup

predicates

⚡ Zero to production

Up and Running in 60 Seconds

No signup. No cloud dependency. No schemas to design. Production-grade infrastructure, self-hosted, on your terms.

Requires Docker

Deploy the Logic Engine

One curl command. Requires Docker with Compose V2 already installed. The installer pulls the image, starts the server via Docker Compose, waits for healthy, and installs the native CLI binary. Nocturnus is live on port 9300 in under 30 seconds.

Hexastore + TMS

Load Your World

Assert facts about your domain: customers, products, rules, state, relationships. Everything is structured, typed, and time-aware. Rules you define teach the engine what to derive. The KB grows as your world grows.

MCP · SDK · REST

Connect Your Agents

Point any MCP-compatible framework, the Python SDK, TypeScript SDK, or direct HTTP at the running server. Context optimization kicks in automatically — your agents get provable answers at 97% lower token cost.

bash

$ curl -fsSL https://raw.githubusercontent.com/Auctalis/nocturnusai/main/install.sh | bash

✓ Nocturnus live on :9300 — WAL ready · Hexastore ready · Inference ready

✓ CLI installed → nocturnusai

$ curl localhost:9300/health

{ "status": "ok", "ready": true }

Stop paying for tokens your agent doesn't need

$0.01

Per Optimized Request

Down from $2.25 unoptimized

97%

Token Cost Reduction

Pay for 3%, get 100% of the answer

< 50ms

Optimization Pipeline

Goal-driven context in milliseconds

ACID

Transactional Truth

Commit or rollback atomically

MCP

Protocol Native

9 tools, any agent framework

97% Less Context. Same Answers.

Pay for Signal. Not Noise.

The Optimization Pipeline

Your Agent + OpenAI, Before and After

Pay for Signal. Not Noise.

97% Context Reduction

From $2.25 to $0.003 Per Request

Cheaper and More Accurate

Plain English In, Verified Facts Out

Ask Questions, Get Grounded Answers

9 MCP Tools, Zero Integration Work

Salience-Ranked Memory

Truth Maintenance System

Temporal Atoms

ACID Transactions

Production Durability

Universal Protocol Support

Not a Plugin. A Cost Engine.

Up and Running in 60 Seconds

Deploy the Logic Engine

Load Your World

Connect Your Agents

97% Less Context.
Same Answers.

Pay for Signal.
Not Noise.