Context costs are the #1 scaling bottleneck

97% Fewer Tokens.
Better Answers. Lower Bill.

Every token you send to an LLM costs money. NocturnusAI's goal-driven context engine delivers only the facts your agent needs — cutting token spend by 97% while making every answer provable and traceable.

Here's the full architecture that makes it possible.

✗

Hallucination

LLMs generate plausible text from statistical patterns. They cannot distinguish between what they've been trained on and what's actually true in your system right now.

✗

No memory lifecycle

Agents accumulate context until the window fills. There's no concept of which facts are still relevant, which have expired, or which should be compressed into summaries.

✗

Reasoning isn't sound

Vector similarity is not logical inference. RAG retrieves text that looks related — it doesn't derive what must be true given what you know. These are fundamentally different operations.

✗

Facts don't have time

Your data changes. A customer upgrades their plan. A system goes offline. Without temporal awareness, your agent reasons over stale facts with no way to know they've expired.

✗

No consistency guarantees

Multi-agent systems share state. Without transactional semantics, two agents making simultaneous updates can corrupt your knowledge base in ways that are invisible and hard to debug.

✗

Zero provenance

When an agent gives a wrong answer, you can't trace why. There's no audit trail showing which facts were used, which rules fired, or which inference path led to the conclusion.

↓ These aren't prompt engineering problems. They're infrastructure problems.

Storage Layer

Hexastore: Six Indexes, One Query

Traditional databases make you think about indexes. Nocturnus doesn't. The Hexastore maintains six simultaneous index permutations — SPO, SOP, PSO, POS, OSP, OPS — over every fact in your knowledge base.

Whatever pattern your query presents — whether the subject, predicate, or object is bound or variable — it hits a direct index lookup. No table scans. No query planner guessing wrong. Sub-100ms retrieval regardless of which terms you specify.

✓ All pattern queries (S??, ?P?, ??O, SP?, S?O, ?PO) hit an index

✓ Thread-safe via ReentrantReadWriteLock — concurrent reads, exclusive writes

✓ Non-binary predicates stored in an optimised fallback map

✓ Scope-aware — queries can be tenant-scoped or global

hexastore — 6 index permutations

# Fact: likes(alice, bob)

# Indexed as:

SPO likes · alice · bob

SOP likes · alice · bob

PSO likes · alice · bob

POS likes · bob · alice

OSP bob · alice · likes

OPS bob · likes · alice

# Any query pattern → direct index hit

likes(?who, bob) → OPS lookup → O(log n)

likes(alice, ?who) → SPO lookup → O(log n)

?pred(alice, bob) → SPO lookup → O(log n)

inference — backward + forward chaining

# Backward chaining — goal-driven

POST /ask

{ "predicate": "grandparent", "args": ["alice", "?who"], "withProof": true }

→ grandparent(alice, charlie)

proof: parent(alice,bob) ∧ parent(bob,charlie)

via rule: grandparent(?x,?z) :- parent(?x,?y), parent(?y,?z)

# Forward chaining — reactive (Rete engine)

POST /tell

{ "predicate": "parent", "args": ["bob", "carol"] }

✓ parent(bob, carol) asserted

✓ grandparent(alice, carol) derived automatically

Rete fired 1 rule match

Reasoning Layer

Two Engines. One Coherent Truth.

Nocturnus runs two inference engines simultaneously. Backward chaining handles goal-driven queries: you ask a question, the SLD resolver works backwards through your rules to find if and how the answer can be proved. It returns the full derivation chain.

Forward chaining via the Rete network handles reactive inference: the moment you assert a new fact, the engine immediately computes all new conclusions that follow from your rules. Your knowledge base is always in a consistent, fully-derived state.

Both engines operate deterministically. The same facts and rules always produce the same conclusions. There is no inference drift, no probabilistic variance, no randomness.

Consistency Layer

Truth Maintenance: Consistency Without the Bookkeeping

When you retract a fact, everything that was derived from it — directly or transitively — is automatically removed from the knowledge base. No manual cleanup. No stale conclusions. No ghost data that leads agents astray.

The ProvenanceTracker maintains a full dependency graph of every inference. When a premise disappears, it walks the graph and retracts every conclusion that depended on it. The operation is atomic.

This is the difference between a database and a reasoning system. Databases let you delete a row and leave referential integrity to you. Nocturnus maintains logical integrity automatically.

truth maintenance — cascade retraction

# KB state: 4 facts, 1 derived

parent(alice, bob) ← asserted

parent(bob, charlie) ← asserted

grandparent(alice, charlie) ← derived

# Retract one fact

POST /forget

{ "predicate": "parent", "args": ["alice", "bob"] }

# KB state: TMS auto-cleaned

parent(alice, bob)

parent(bob, charlie)

grandparent(alice, charlie) ← cascade removed

✓ KB is consistent. Zero stale inferences.

temporal — time-aware facts

# Fact with TTL — auto-expires

POST /tell

{

"predicate": "session_active",

"args": ["user_123"],

"ttl": 3600 // expires in 1 hour

}

# Point-in-time query

POST /memory/query/temporal

{ "at": "2025-01-15T14:00:00Z" }

→ Returns all facts valid at that instant

# Bounded validity

"validFrom": "2025-Q1-start"

"validUntil": "2025-Q1-end"

Fact only true within its window

Temporal Layer

Facts That Know When They're True

The real world is not a static snapshot. Subscriptions expire. Sessions end. Prices change. Regulations take effect on specific dates. Every fact in Nocturnus carries validFrom, validUntil, createdAt, and ttl fields.

Set a TTL and the fact auto-expires. Define validity windows for seasonal rules. Query what was true at any arbitrary point in time. Build audit trails where you can reconstruct exactly what your agent knew at the moment it made a decision.

Temporal awareness isn't a feature layer added on top — it's woven into the atom data model, which means every query, every inference, and every export respects time automatically.

Memory Layer

Memory That Manages Itself

Agent context windows are finite. What goes in matters enormously. Nocturnus computes a composite salience score for every fact — combining recency, access frequency, and explicit priority — and uses it to determine what deserves space in your agent's context window.

/memory/context returns the top-K most salient facts, ranked. /memory/consolidate compresses repeated episodic patterns into compact semantic summaries — the difference between "the agent remembered 47 similar login events" and "the agent remembers the login pattern." /memory/decay evicts facts that have fallen below relevance thresholds.

Real-time memory change events stream via SSE at GET /memory/events — so your orchestration layer can react the moment anything significant changes.

memory — salience lifecycle

# Get top-K salient facts for agent context

GET /memory/context?limit=10

0.94 customer_tier(acme, enterprise)

0.87 sla_enabled(acme)

0.71 last_contacted(acme, 2025-01-14)

0.43 location(acme, austin)

# Consolidate episodic → semantic

POST /memory/consolidate

✓ 47 login_event atoms → 1 semantic summary

✓ Context window: 47 tokens → 3 tokens

transactions — ACID reasoning

# Begin a transaction

POST /tx/begin → TX_ID: "tx_a3f9"

# Assert facts within the transaction

POST /tell X-Transaction-ID: tx_a3f9

{ "predicate": "approved", "args": ["order_99"] }

POST /tell X-Transaction-ID: tx_a3f9

{ "predicate": "shipped", "args": ["order_99"] }

# Commit or rollback atomically

POST /tx/commit/tx_a3f9

✓ Both facts committed. Contradiction check passed.

# Or rollback — nothing is visible to other agents

POST /tx/rollback/tx_a3f9

✓ Transaction aborted. KB unchanged.

Transaction Layer

Atomic Reasoning for Multi-Agent Systems

In a multi-agent system, agents read and write shared state concurrently. Without transactional guarantees, two agents updating the same knowledge simultaneously can leave your KB in an inconsistent state that's invisible until something breaks.

Nocturnus gives you full ACID transactions. Begin a transaction, assert multiple facts, and they're invisible to all other agents until you commit. On commit, a contradiction detector runs — if your assertions conflict with existing rules or constraints, the transaction is rejected and nothing changes.

This also enables hypothetical reasoning: open a transaction, explore a set of conclusions, then rollback without affecting shared state. Your agent can reason "what would be true if X were the case" safely.

Operations Layer

Production-Grade From the Start

Nocturnus is not a prototype or a research engine. It's designed for production deployment. Every mutation is appended to a Write-Ahead Log before it's applied to the in-memory store — so a crash at any point leaves you with a recoverable state. Periodic full-state snapshots keep recovery fast.

Leader/follower replication streams the WAL to read replicas, letting you scale query throughput horizontally. Prometheus metrics at /metrics cover 20+ signals — fact operations, inference latency, MCP tool call rates, memory pressure, LLM extraction calls. Grafana dashboards included.

WAL + Snapshots

Crash recovery

Leader/Follower

Read scaling

Prometheus metrics

20+ signals

Kubernetes-ready

YAML included

/health + /health/ready

Probe endpoints

AES-256 at rest

Optional encryption

deployment — production stack

# Docker Compose — up in 30 seconds

docker compose up -d

✓ Server live on :9300

✓ WAL initialised at ./data/wal

✓ Prometheus at :9300/metrics

# With monitoring stack

docker compose --profile monitoring up -d

✓ Grafana at :3000 (dashboards pre-loaded)

✓ Prometheus scraping :9300/metrics

# Kubernetes liveness + readiness

livenessProbe: GET /health

readinessProbe: GET /health/ready

Connects to Everything in the Agent Ecosystem

One running server. Every protocol. Your agents don't need to change — they just point to Nocturnus.

MCP (Model Context Protocol)

2025-11-25 spec

HTTP + SSE transport. 16 tools exposed — facts, rules, inference, memory lifecycle, aggregation, and scope management. Any MCP-compatible agent, IDE, or framework connects with a two-line JSON config.

// .cursor/mcp.json
{
  "mcpServers": {
    "nocturnus": {
      "url": "http://localhost:9300/mcp/sse"
    }
  }
}

A2A (Agent-to-Agent Protocol)

Discovery

Nocturnus publishes an A2A Agent Card at /.well-known/agent.json describing its capabilities. Other agents in your ecosystem can discover and interact with it automatically — no manual coordination.

GET /.well-known/agent.json
→ Agent Card: capabilities, endpoints,
  supported protocols, authentication

Python SDK + LangChain

pip install nocturnusai

Async and sync clients. Four pre-built LangChain tools (assert, query, infer, context) ready to drop into any agent executor. Full Pydantic models for every DTO.

from nocturnusai.langchain import (
    get_nocturnusai_tools
)

tools = get_nocturnusai_tools(client)
# → tell, ask, teach, forget,
#   recall, context (6 tools)

Your agents are ready for better infrastructure.

Start with docker compose up and have your first fact stored in under a minute. No schemas, no migrations, no infra team required.

Read the Docs → API Reference

97% Fewer Tokens. Better Answers. Lower Bill.