LLM Integration

Use LLMs to extract facts from natural language, synthesize grounded answers, and cut context costs by 97% with goal-driven optimization.

How it works: The LLM layer is optional. Core fact storage, rules, and inference work without any LLM. When enabled, the LLM handles translation between natural language and structured predicates — but all retrieval and reasoning remains deterministic. Combined with the Context Management Engine, this means your LLM calls use 97% fewer tokens — the extraction is smart, and the context delivery is optimized.

Supported Providers

Nocturnus supports 5 LLM providers. Set one environment variable to enable:

Provider	Environment Variable	Models
Anthropic	ANTHROPIC_API_KEY	Claude Sonnet, Haiku, Opus
OpenAI	OPENAI_API_KEY	GPT-4o, GPT-4, GPT-3.5
Google	GOOGLE_API_KEY	Gemini Pro, Flash
Ollama	LLM_BASE_URL	Any local model (Llama, Mistral, etc.) — set to `http://localhost:11434/v1`
OpenAI-Compatible	LLM_BASE_URL + LLM_API_KEY	Groq, Mistral, DeepSeek, etc.

Nocturnus checks for keys in the order listed above and uses the first one found.

Fact Extraction — POST /extract

Feed Nocturnus plain English text and the LLM extracts structured facts (and optionally rules). With assert: true, extracted facts are automatically stored in the knowledge base.

Request

{
  "text": "Acme Corp is on the enterprise plan and has been a customer since 2019. They are based in Austin, Texas.",
  "assert": true,
  "rules": false,
  "scope": null,
  "context": null
}

Field	Type	Default	Description
text	string	(required)	Plain text to extract facts from
assert	boolean	false	Auto-store extracted facts in the KB
rules	boolean	false	Also extract logical rules from the text
scope	string?	null	Scope for asserted facts
context	string?	null	Additional context to help the LLM interpret the text

Response

{
  "facts": [
    { "predicate": "subscription_tier", "args": ["acme_corp", "enterprise"] },
    { "predicate": "customer_since", "args": ["acme_corp", "2019"] },
    { "predicate": "location", "args": ["acme_corp", "austin_texas"] }
  ],
  "rules": [],
  "asserted": true,
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514"
}

Batch Extraction

Process multiple texts in one call with POST /extract/batch:

{
  "texts": [
    "Alice is a senior engineer at Acme.",
    "Bob manages the sales team."
  ],
  "assert": true
}

Answer Synthesis — POST /synthesize

Ask a natural language question. Nocturnus translates it into logic queries, finds matching facts, and uses the LLM to synthesize a grounded answer — with full derivation showing which facts were used.

Request

{
  "question": "What plan is Acme Corp on and where are they located?",
  "scope": null
}

Response

{
  "answer": "Acme Corp is on the enterprise plan and is based in Austin, Texas.",
  "derivation": [
    { "fact": "subscription_tier(acme_corp, enterprise)", "type": "fact_match" },
    { "fact": "location(acme_corp, austin_texas)", "type": "fact_match" }
  ],
  "missingContext": "",
  "confidence": 0.95,
  "queriesExecuted": [
    "subscription_tier(acme_corp, ?tier)",
    "location(acme_corp, ?loc)"
  ],
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514"
}

Grounded, not hallucinated. The derivation field shows exactly which stored facts supported the answer. If no facts match, the missingContext field explains what's missing — the LLM won't make up an answer.

How Synthesis Works

Schema discovery — Nocturnus finds all predicates in the KB
Query translation — the LLM maps the question to logical query patterns
Fact retrieval — queries are executed via direct lookup and backward chaining
Answer synthesis — the LLM composes a natural language answer from the matched facts

LLM + Context Optimization Loop

For multi-turn conversations, combine extraction and context APIs:

Use /extract (or /context/ingest) to convert raw language into facts.
Use /context/optimize with a stable sessionId to build the prompt window.
Use /context/diff on subsequent turns so only changes are sent to the model.

Configuration

Additional env vars for tuning LLM behavior:

EXTRACTION_ENABLED=true — enable /extract and /synthesize
LLM_MODEL=claude-sonnet-4-20250514 — override the default model
LLM_TEMPERATURE=0.1 — lower = more deterministic extraction

Cost Optimization

Every LLM call costs money — OpenAI charges $15/M tokens for GPT-4o, Anthropic charges $15/M for Claude Sonnet. NocturnusAI reduces your bill in two ways:

Smart extraction — /extract converts unstructured text into compact structured facts. A 500-word paragraph becomes 3-5 predicate/args pairs.
Goal-driven context — /context/optimize delivers only the facts your agent needs for a specific question. 500 facts → 15 facts → 820 tokens instead of 150K.

At scale (1,000 requests/hour), this is the difference between $54,000/month and $240/month in LLM costs.

Works with any provider. Context optimization happens before the LLM call. Whether you use OpenAI, Anthropic, Google, or Ollama, you save the same 97% on context tokens.

LLM Integration

Supported Providers

Fact Extraction — POST /extract

Request

Response

Batch Extraction

Answer Synthesis — POST /synthesize

Request

Response

How Synthesis Works

LLM + Context Optimization Loop

Configuration

Cost Optimization

What's Next?

Quickstart →

API Reference →

Integrations →