97% Fewer Tokens.
Better Answers. Lower Bill.
Every token you send to an LLM costs money. NocturnusAI's goal-driven context engine delivers only the facts your agent needs — cutting token spend by 97% while making every answer provable and traceable.
Here's the full architecture that makes it possible.
Hallucination
LLMs generate plausible text from statistical patterns. They cannot distinguish between what they've been trained on and what's actually true in your system right now.
No memory lifecycle
Agents accumulate context until the window fills. There's no concept of which facts are still relevant, which have expired, or which should be compressed into summaries.
Reasoning isn't sound
Vector similarity is not logical inference. RAG retrieves text that looks related — it doesn't derive what must be true given what you know. These are fundamentally different operations.
Facts don't have time
Your data changes. A customer upgrades their plan. A system goes offline. Without temporal awareness, your agent reasons over stale facts with no way to know they've expired.
No consistency guarantees
Multi-agent systems share state. Without transactional semantics, two agents making simultaneous updates can corrupt your knowledge base in ways that are invisible and hard to debug.
Zero provenance
When an agent gives a wrong answer, you can't trace why. There's no audit trail showing which facts were used, which rules fired, or which inference path led to the conclusion.
Hexastore: Six Indexes, One Query
Traditional databases make you think about indexes. Nocturnus doesn't. The Hexastore maintains six simultaneous index permutations — SPO, SOP, PSO, POS, OSP, OPS — over every fact in your knowledge base.
Whatever pattern your query presents — whether the subject, predicate, or object is bound or variable — it hits a direct index lookup. No table scans. No query planner guessing wrong. Sub-100ms retrieval regardless of which terms you specify.
Two Engines. One Coherent Truth.
Nocturnus runs two inference engines simultaneously. Backward chaining handles goal-driven queries: you ask a question, the SLD resolver works backwards through your rules to find if and how the answer can be proved. It returns the full derivation chain.
Forward chaining via the Rete network handles reactive inference: the moment you assert a new fact, the engine immediately computes all new conclusions that follow from your rules. Your knowledge base is always in a consistent, fully-derived state.
Both engines operate deterministically. The same facts and rules always produce the same conclusions. There is no inference drift, no probabilistic variance, no randomness.
Truth Maintenance: Consistency Without the Bookkeeping
When you retract a fact, everything that was derived from it — directly or transitively — is automatically removed from the knowledge base. No manual cleanup. No stale conclusions. No ghost data that leads agents astray.
The ProvenanceTracker maintains a full dependency graph of every inference. When a premise disappears, it walks the graph and retracts every conclusion that depended on it. The operation is atomic.
This is the difference between a database and a reasoning system. Databases let you delete a row and leave referential integrity to you. Nocturnus maintains logical integrity automatically.
Facts That Know When They're True
The real world is not a static snapshot. Subscriptions expire. Sessions end. Prices change. Regulations take effect on specific dates. Every fact in Nocturnus carries validFrom, validUntil, createdAt, and ttl fields.
Set a TTL and the fact auto-expires. Define validity windows for seasonal rules. Query what was true at any arbitrary point in time. Build audit trails where you can reconstruct exactly what your agent knew at the moment it made a decision.
Temporal awareness isn't a feature layer added on top — it's woven into the atom data model, which means every query, every inference, and every export respects time automatically.
Memory That Manages Itself
Agent context windows are finite. What goes in matters enormously. Nocturnus computes a composite salience score for every fact — combining recency, access frequency, and explicit priority — and uses it to determine what deserves space in your agent's context window.
/memory/context returns the top-K most salient facts, ranked. /memory/consolidate compresses repeated episodic patterns into compact semantic summaries — the difference between "the agent remembered 47 similar login events" and "the agent remembers the login pattern." /memory/decay evicts facts that have fallen below relevance thresholds.
Real-time memory change events stream via SSE at GET /memory/events — so your orchestration layer can react the moment anything significant changes.
Atomic Reasoning for Multi-Agent Systems
In a multi-agent system, agents read and write shared state concurrently. Without transactional guarantees, two agents updating the same knowledge simultaneously can leave your KB in an inconsistent state that's invisible until something breaks.
Nocturnus gives you full ACID transactions. Begin a transaction, assert multiple facts, and they're invisible to all other agents until you commit. On commit, a contradiction detector runs — if your assertions conflict with existing rules or constraints, the transaction is rejected and nothing changes.
This also enables hypothetical reasoning: open a transaction, explore a set of conclusions, then rollback without affecting shared state. Your agent can reason "what would be true if X were the case" safely.
Production-Grade From the Start
Nocturnus is not a prototype or a research engine. It's designed for production deployment. Every mutation is appended to a Write-Ahead Log before it's applied to the in-memory store — so a crash at any point leaves you with a recoverable state. Periodic full-state snapshots keep recovery fast.
Leader/follower replication streams the WAL to read replicas, letting you scale query throughput horizontally. Prometheus metrics at /metrics cover 20+ signals — fact operations, inference latency, MCP tool call rates, memory pressure, LLM extraction calls. Grafana dashboards included.
Connects to Everything in the Agent Ecosystem
One running server. Every protocol. Your agents don't need to change — they just point to Nocturnus.
MCP (Model Context Protocol)
2025-11-25 specHTTP + SSE transport. 16 tools exposed — facts, rules, inference, memory lifecycle, aggregation, and scope management. Any MCP-compatible agent, IDE, or framework connects with a two-line JSON config.
// .cursor/mcp.json
{
"mcpServers": {
"nocturnus": {
"url": "http://localhost:9300/mcp/sse"
}
}
} A2A (Agent-to-Agent Protocol)
DiscoveryNocturnus publishes an A2A Agent Card at /.well-known/agent.json describing its capabilities. Other agents in your ecosystem can discover and interact with it automatically — no manual coordination.
GET /.well-known/agent.json → Agent Card: capabilities, endpoints, supported protocols, authentication
Python SDK + LangChain
pip install nocturnusaiAsync and sync clients. Four pre-built LangChain tools (assert, query, infer, context) ready to drop into any agent executor. Full Pydantic models for every DTO.
from nocturnusai.langchain import (
get_nocturnusai_tools
)
tools = get_nocturnusai_tools(client)
# → tell, ask, teach, forget,
# recall, context (6 tools) Your agents are ready for better infrastructure.
Start with docker compose up and have your first fact stored in under a minute. No schemas, no migrations, no infra team required.