Context costs are the #1 scaling bottleneck

97% Fewer Tokens.
Better Answers. Lower Bill.

Every token you send to an LLM costs money. NocturnusAI's goal-driven context engine delivers only the facts your agent needs — cutting token spend by 97% while making every answer provable and traceable.

Here's the full architecture that makes it possible.

Hallucination

LLMs generate plausible text from statistical patterns. They cannot distinguish between what they've been trained on and what's actually true in your system right now.

No memory lifecycle

Agents accumulate context until the window fills. There's no concept of which facts are still relevant, which have expired, or which should be compressed into summaries.

Reasoning isn't sound

Vector similarity is not logical inference. RAG retrieves text that looks related — it doesn't derive what must be true given what you know. These are fundamentally different operations.

Facts don't have time

Your data changes. A customer upgrades their plan. A system goes offline. Without temporal awareness, your agent reasons over stale facts with no way to know they've expired.

No consistency guarantees

Multi-agent systems share state. Without transactional semantics, two agents making simultaneous updates can corrupt your knowledge base in ways that are invisible and hard to debug.

Zero provenance

When an agent gives a wrong answer, you can't trace why. There's no audit trail showing which facts were used, which rules fired, or which inference path led to the conclusion.

These aren't prompt engineering problems. They're infrastructure problems.
Storage Layer

Hexastore: Six Indexes, One Query

Traditional databases make you think about indexes. Nocturnus doesn't. The Hexastore maintains six simultaneous index permutations — SPO, SOP, PSO, POS, OSP, OPS — over every fact in your knowledge base.

Whatever pattern your query presents — whether the subject, predicate, or object is bound or variable — it hits a direct index lookup. No table scans. No query planner guessing wrong. Sub-100ms retrieval regardless of which terms you specify.

All pattern queries (S??, ?P?, ??O, SP?, S?O, ?PO) hit an index
Thread-safe via ReentrantReadWriteLock — concurrent reads, exclusive writes
Non-binary predicates stored in an optimised fallback map
Scope-aware — queries can be tenant-scoped or global
hexastore — 6 index permutations
# Fact: likes(alice, bob)
# Indexed as:
SPO likes · alice · bob
SOP likes · alice · bob
PSO likes · alice · bob
POS likes · bob · alice
OSP bob · alice · likes
OPS bob · likes · alice
# Any query pattern → direct index hit
likes(?who, bob) → OPS lookup → O(log n)
likes(alice, ?who) → SPO lookup → O(log n)
?pred(alice, bob) → SPO lookup → O(log n)
inference — backward + forward chaining
# Backward chaining — goal-driven
POST /ask
{ "predicate": "grandparent", "args": ["alice", "?who"], "withProof": true }
→ grandparent(alice, charlie)
proof: parent(alice,bob) ∧ parent(bob,charlie)
      via rule: grandparent(?x,?z) :- parent(?x,?y), parent(?y,?z)
# Forward chaining — reactive (Rete engine)
POST /tell
{ "predicate": "parent", "args": ["bob", "carol"] }
✓ parent(bob, carol) asserted
✓ grandparent(alice, carol) derived automatically
Rete fired 1 rule match
Reasoning Layer

Two Engines. One Coherent Truth.

Nocturnus runs two inference engines simultaneously. Backward chaining handles goal-driven queries: you ask a question, the SLD resolver works backwards through your rules to find if and how the answer can be proved. It returns the full derivation chain.

Forward chaining via the Rete network handles reactive inference: the moment you assert a new fact, the engine immediately computes all new conclusions that follow from your rules. Your knowledge base is always in a consistent, fully-derived state.

Both engines operate deterministically. The same facts and rules always produce the same conclusions. There is no inference drift, no probabilistic variance, no randomness.

Consistency Layer

Truth Maintenance: Consistency Without the Bookkeeping

When you retract a fact, everything that was derived from it — directly or transitively — is automatically removed from the knowledge base. No manual cleanup. No stale conclusions. No ghost data that leads agents astray.

The ProvenanceTracker maintains a full dependency graph of every inference. When a premise disappears, it walks the graph and retracts every conclusion that depended on it. The operation is atomic.

This is the difference between a database and a reasoning system. Databases let you delete a row and leave referential integrity to you. Nocturnus maintains logical integrity automatically.

truth maintenance — cascade retraction
# KB state: 4 facts, 1 derived
parent(alice, bob) ← asserted
parent(bob, charlie) ← asserted
grandparent(alice, charlie) ← derived
# Retract one fact
POST /forget
{ "predicate": "parent", "args": ["alice", "bob"] }
# KB state: TMS auto-cleaned
parent(alice, bob)
parent(bob, charlie)
grandparent(alice, charlie) ← cascade removed
✓ KB is consistent. Zero stale inferences.
temporal — time-aware facts
# Fact with TTL — auto-expires
POST /tell
{
"predicate": "session_active",
"args": ["user_123"],
"ttl": 3600 // expires in 1 hour
}
# Point-in-time query
POST /memory/query/temporal
{ "at": "2025-01-15T14:00:00Z" }
→ Returns all facts valid at that instant
# Bounded validity
"validFrom": "2025-Q1-start"
"validUntil": "2025-Q1-end"
Fact only true within its window
Temporal Layer

Facts That Know When They're True

The real world is not a static snapshot. Subscriptions expire. Sessions end. Prices change. Regulations take effect on specific dates. Every fact in Nocturnus carries validFrom, validUntil, createdAt, and ttl fields.

Set a TTL and the fact auto-expires. Define validity windows for seasonal rules. Query what was true at any arbitrary point in time. Build audit trails where you can reconstruct exactly what your agent knew at the moment it made a decision.

Temporal awareness isn't a feature layer added on top — it's woven into the atom data model, which means every query, every inference, and every export respects time automatically.

Memory Layer

Memory That Manages Itself

Agent context windows are finite. What goes in matters enormously. Nocturnus computes a composite salience score for every fact — combining recency, access frequency, and explicit priority — and uses it to determine what deserves space in your agent's context window.

/memory/context returns the top-K most salient facts, ranked. /memory/consolidate compresses repeated episodic patterns into compact semantic summaries — the difference between "the agent remembered 47 similar login events" and "the agent remembers the login pattern." /memory/decay evicts facts that have fallen below relevance thresholds.

Real-time memory change events stream via SSE at GET /memory/events — so your orchestration layer can react the moment anything significant changes.

memory — salience lifecycle
# Get top-K salient facts for agent context
GET /memory/context?limit=10
0.94 customer_tier(acme, enterprise)
0.87 sla_enabled(acme)
0.71 last_contacted(acme, 2025-01-14)
0.43 location(acme, austin)
# Consolidate episodic → semantic
POST /memory/consolidate
✓ 47 login_event atoms → 1 semantic summary
✓ Context window: 47 tokens → 3 tokens
transactions — ACID reasoning
# Begin a transaction
POST /tx/begin → TX_ID: "tx_a3f9"
# Assert facts within the transaction
POST /tell X-Transaction-ID: tx_a3f9
{ "predicate": "approved", "args": ["order_99"] }
POST /tell X-Transaction-ID: tx_a3f9
{ "predicate": "shipped", "args": ["order_99"] }
# Commit or rollback atomically
POST /tx/commit/tx_a3f9
✓ Both facts committed. Contradiction check passed.
# Or rollback — nothing is visible to other agents
POST /tx/rollback/tx_a3f9
✓ Transaction aborted. KB unchanged.
Transaction Layer

Atomic Reasoning for Multi-Agent Systems

In a multi-agent system, agents read and write shared state concurrently. Without transactional guarantees, two agents updating the same knowledge simultaneously can leave your KB in an inconsistent state that's invisible until something breaks.

Nocturnus gives you full ACID transactions. Begin a transaction, assert multiple facts, and they're invisible to all other agents until you commit. On commit, a contradiction detector runs — if your assertions conflict with existing rules or constraints, the transaction is rejected and nothing changes.

This also enables hypothetical reasoning: open a transaction, explore a set of conclusions, then rollback without affecting shared state. Your agent can reason "what would be true if X were the case" safely.

Operations Layer

Production-Grade From the Start

Nocturnus is not a prototype or a research engine. It's designed for production deployment. Every mutation is appended to a Write-Ahead Log before it's applied to the in-memory store — so a crash at any point leaves you with a recoverable state. Periodic full-state snapshots keep recovery fast.

Leader/follower replication streams the WAL to read replicas, letting you scale query throughput horizontally. Prometheus metrics at /metrics cover 20+ signals — fact operations, inference latency, MCP tool call rates, memory pressure, LLM extraction calls. Grafana dashboards included.

WAL + Snapshots
Crash recovery
Leader/Follower
Read scaling
Prometheus metrics
20+ signals
Kubernetes-ready
YAML included
/health + /health/ready
Probe endpoints
AES-256 at rest
Optional encryption
deployment — production stack
# Docker Compose — up in 30 seconds
docker compose up -d
✓ Server live on :9300
✓ WAL initialised at ./data/wal
✓ Prometheus at :9300/metrics
# With monitoring stack
docker compose --profile monitoring up -d
✓ Grafana at :3000 (dashboards pre-loaded)
✓ Prometheus scraping :9300/metrics
# Kubernetes liveness + readiness
livenessProbe: GET /health
readinessProbe: GET /health/ready

Connects to Everything in the Agent Ecosystem

One running server. Every protocol. Your agents don't need to change — they just point to Nocturnus.

MCP (Model Context Protocol)

2025-11-25 spec

HTTP + SSE transport. 16 tools exposed — facts, rules, inference, memory lifecycle, aggregation, and scope management. Any MCP-compatible agent, IDE, or framework connects with a two-line JSON config.

// .cursor/mcp.json
{
  "mcpServers": {
    "nocturnus": {
      "url": "http://localhost:9300/mcp/sse"
    }
  }
}

A2A (Agent-to-Agent Protocol)

Discovery

Nocturnus publishes an A2A Agent Card at /.well-known/agent.json describing its capabilities. Other agents in your ecosystem can discover and interact with it automatically — no manual coordination.

GET /.well-known/agent.json
→ Agent Card: capabilities, endpoints,
  supported protocols, authentication

Python SDK + LangChain

pip install nocturnusai

Async and sync clients. Four pre-built LangChain tools (assert, query, infer, context) ready to drop into any agent executor. Full Pydantic models for every DTO.

from nocturnusai.langchain import (
    get_nocturnusai_tools
)

tools = get_nocturnusai_tools(client)
# → tell, ask, teach, forget,
#   recall, context (6 tools)

Your agents are ready for better infrastructure.

Start with docker compose up and have your first fact stored in under a minute. No schemas, no migrations, no infra team required.