SedaSoft has spent twenty-five years building robust infrastructure for information management - with common structure, accountability and provenance built in from the ground up. The result is a production AI platform that routes seven of eight RAG pipeline stages through local neural networks, calling a cloud LLM only when it is genuinely needed. The Efficiency Engine - the cross-system layer that enforces token discipline, carbon accountability and latency constraints before any prompt reaches the LLM - is available as a standalone component.
89.2%
LoComo conversational memory
100%
Unanswerable accuracy (4 benchmarks)
62%
Token reduction vs standard RAG
0.4g
CO₂ saved per query
All verified via public harnesses. Full benchmark results →
Across four independent benchmarks - LifeBench, BEAM, LoComo, and MemSim - SiteEngine AI achieved 100% accuracy on unanswerable queries. Every time the relevant fact was missing from memory, the system correctly refused to answer instead of hallucinating.
This is not a coincidence. It is a design consequence of how the memory layer frames injected context - as background, not fact. Production agents that refuse when uncertain are recoverable. Agents that confabulate wrong memories are not.
See the full benchmark results →100%
Unanswerable accuracy
SedaSoft is an independent innovation laboratory. The principles it builds on - structure before scale, provenance before generation, accountability before deployment - predate the current AI moment by twenty-seven years. The EU AI Act has caught up to those principles. The rest of the industry is catching up now. More about sedasoft →
Existing benchmarks measure energy at the model layer. SedaSoft measures it at the application layer - where architectural decisions multiply or eliminate whatever the model saved.
The Efficiency Engine uses local ONNX neural networks for classification, routing, embedding and compression. The LLM API is called only for the final generation step. The result: 62% fewer tokens per query (measured across four academic RAG datasets), 75% fewer API calls, and a methodology aligned with EU AI Act energy reporting requirements from August 2026.
The Efficiency Engine is available as a standalone component - independently of the full SiteEngine AI platform. If token cost, carbon accountability, or AI efficiency at the application layer is your immediate challenge, we would welcome a conversation.
62%
Average token reduction vs standard RAG
75%
Fewer cloud API calls
7 of 8
Pipeline stages run locally
Per query
Carbon measurement at application layer
Each thesis documents one of the pillars of our production infrastructure - its architecture, methodology, and empirical results. Written in the style of academic research and benchmarked against external, reproducible datasets.
Multi-tenant RAG platform with cognitive AI, PAD emotional model, Ebbinghaus memory, and the first production implementation of Communication Accommodation Theory.
Read thesisCross-system architecture for cost-aware, carbon-reduced AI. First per-query carbon accounting framework at the application layer. Self-regulating health gating.
Read thesisMulti-format document ingestion engine. Content-aware chunking across 12 formats. Benchmarked against HotpotQA, SQuAD, FinQA, and three other public datasets.
Read thesisAtomic document staging and promotion for production RAG. Transactional ingestion with rollback, audit trails, and expert promotion. Evaluated on 35,000+ document sections.
Read thesisAn information management platform that anticipated declarative configuration, page inheritance, and content-type abstraction by more than a decade. Still running on the same architecture principles 27 years on.
Read thesisAll five theses with abstracts, key contributions, and reading links.
The SiteEngine AI cognitive system is grounded in established psychological models rather than engineered from intuition. The PAD emotional model governs emotional state. Ebbinghaus forgetting curves govern memory. Communication Accommodation Theory governs how the system adapts to individual users over time.
PAD Emotional Model
Pleasure, Arousal, Dominance - text-based emotional state modelling. No biometrics. EU AI Act compliant by design.
Ebbinghaus Memory Decay
Forgetting curves fade outdated context naturally. The system remembers what matters and releases what has become stale.
Communication Accommodation
First production implementation of Adaptive Relationship Theory in an AI system. The platform adjusts register, pace, and depth to each user over time.
SiteEngine AI is approximately 60,000 lines of Go - not Python, not a framework assembled from libraries. Compiled binaries. Multi-tenant isolation. 17-stage hybrid pipeline. The platform is built the way infrastructure should be built: for reliability first, performance second, and convenience last.
17-stage hybrid pipeline
Local ONNX neural networks handle stages 1-7. Stage 8 - generation - calls the LLM. Everything else stays inside your perimeter.
Dgraph knowledge graph
Entity-relationship mapping across the full document corpus. Cross-document connections that vector search alone cannot find.
80+ MCP tools for Claude Desktop
Full Model Context Protocol integration, giving Claude Desktop direct access to the SiteEngine AI platform and knowledge graph.
The /clear problem is real. Every time you start a new Claude Code session or hit /clear mid-session, you lose the context you spent the last hour building.
Hydrate is a thin integration layer that captures a project's context automatically and injects it into your next session - in Claude Code, VS Code Copilot, Mistral or any MCP client. It's the first tool to make memory portable across sessions, tools and models, it delivers aggressive token compression and it uses the same SiteEngine AI infrastructure that scored 89.2% on LoComo.
In Claude Code, three small binaries ride on the existing hook system. A Stop hook captures context before /clear. A prompt hook enriches every turn with relevant memory. A /hydrate command restores context selectively after clearing. All three are stateless, removable, and designed to be deleted when they are no longer needed.
"Get Opus-quality responses at Haiku-level token cost. Move context between models. Three thin hooks. All removable."
Full
summary_32 + all facts + preferences
Economy
summary_16 + top 5 facts
Turbo
summary_16 + top 2 facts
Turbo mode makes Opus cheaper per turn than Haiku at full context.
SedaSoft is interested in research collaboration, academic partnership and conversations with organisations working on adjacent problems in AI efficiency, responsible deployment, or cognitive systems architecture.