LLM Integration
Use LLMs to extract facts from natural language, synthesize grounded answers, and cut context costs by 97% with goal-driven optimization.
Supported Providers
Nocturnus supports 5 LLM providers. Set one environment variable to enable:
| Provider | Environment Variable | Models |
|---|---|---|
| Anthropic | ANTHROPIC_API_KEY | Claude Sonnet, Haiku, Opus |
| OpenAI | OPENAI_API_KEY | GPT-4o, GPT-4, GPT-3.5 |
| GOOGLE_API_KEY | Gemini Pro, Flash | |
| Ollama | LLM_BASE_URL | Any local model (Llama, Mistral, etc.) — set to http://localhost:11434/v1 |
| OpenAI-Compatible | LLM_BASE_URL + LLM_API_KEY | Groq, Mistral, DeepSeek, etc. |
Nocturnus checks for keys in the order listed above and uses the first one found.
Fact Extraction — POST /extract
Feed Nocturnus plain English text and the LLM extracts structured facts (and optionally rules). With assert: true, extracted facts are automatically stored in the knowledge base.
Request
{
"text": "Acme Corp is on the enterprise plan and has been a customer since 2019. They are based in Austin, Texas.",
"assert": true,
"rules": false,
"scope": null,
"context": null
} | Field | Type | Default | Description |
|---|---|---|---|
| text | string | (required) | Plain text to extract facts from |
| assert | boolean | false | Auto-store extracted facts in the KB |
| rules | boolean | false | Also extract logical rules from the text |
| scope | string? | null | Scope for asserted facts |
| context | string? | null | Additional context to help the LLM interpret the text |
Response
{
"facts": [
{ "predicate": "subscription_tier", "args": ["acme_corp", "enterprise"] },
{ "predicate": "customer_since", "args": ["acme_corp", "2019"] },
{ "predicate": "location", "args": ["acme_corp", "austin_texas"] }
],
"rules": [],
"asserted": true,
"provider": "anthropic",
"model": "claude-sonnet-4-20250514"
} Batch Extraction
Process multiple texts in one call with POST /extract/batch:
{
"texts": [
"Alice is a senior engineer at Acme.",
"Bob manages the sales team."
],
"assert": true
} Answer Synthesis — POST /synthesize
Ask a natural language question. Nocturnus translates it into logic queries, finds matching facts, and uses the LLM to synthesize a grounded answer — with full derivation showing which facts were used.
Request
{
"question": "What plan is Acme Corp on and where are they located?",
"scope": null
} Response
{
"answer": "Acme Corp is on the enterprise plan and is based in Austin, Texas.",
"derivation": [
{ "fact": "subscription_tier(acme_corp, enterprise)", "type": "fact_match" },
{ "fact": "location(acme_corp, austin_texas)", "type": "fact_match" }
],
"missingContext": "",
"confidence": 0.95,
"queriesExecuted": [
"subscription_tier(acme_corp, ?tier)",
"location(acme_corp, ?loc)"
],
"provider": "anthropic",
"model": "claude-sonnet-4-20250514"
} derivation field shows exactly which stored facts supported the answer. If no facts match, the missingContext field explains what's missing — the LLM won't make up an answer.
How Synthesis Works
- Schema discovery — Nocturnus finds all predicates in the KB
- Query translation — the LLM maps the question to logical query patterns
- Fact retrieval — queries are executed via direct lookup and backward chaining
- Answer synthesis — the LLM composes a natural language answer from the matched facts
LLM + Context Optimization Loop
For multi-turn conversations, combine extraction and context APIs:
- Use
/extract(or/context/ingest) to convert raw language into facts. - Use
/context/optimizewith a stablesessionIdto build the prompt window. - Use
/context/diffon subsequent turns so only changes are sent to the model.
Configuration
Additional env vars for tuning LLM behavior:
EXTRACTION_ENABLED=true— enable/extractand/synthesizeLLM_MODEL=claude-sonnet-4-20250514— override the default modelLLM_TEMPERATURE=0.1— lower = more deterministic extraction
Cost Optimization
Every LLM call costs money — OpenAI charges $15/M tokens for GPT-4o, Anthropic charges $15/M for Claude Sonnet. NocturnusAI reduces your bill in two ways:
- Smart extraction —
/extractconverts unstructured text into compact structured facts. A 500-word paragraph becomes 3-5 predicate/args pairs. - Goal-driven context —
/context/optimizedelivers only the facts your agent needs for a specific question. 500 facts → 15 facts → 820 tokens instead of 150K.
At scale (1,000 requests/hour), this is the difference between $54,000/month and $240/month in LLM costs.