SedaSoft Ltd. · R&D Pillars

25 Years of R&D

Twenty-five years of research has produced the five pillars that now underpin SiteEngine AI: a declarative information management platform, a document ingestion engine, a transactional staging architecture, a cognitive AI platform, and the Efficiency Engine that ties them together. Each pillar exists because an earlier pillar asked a question the next one had to answer.

Two principles run through all of it: Efficiency-as-Architecture - computational cost treated as a design constraint from day one rather than an optimisation at the end - and Compliance-as-Architecture - EU AI Act and US AI framework obligations built into the pipeline, the governance dashboard and the human-in-loop refinement path, not bolted on afterwards.

"The purpose of these theses is not to claim novelty for its own sake, but to document what we built, why it was built that way and what the evidence tells us about it's effectiveness. Science requires a record. Engineering without documentation is folklore."

- Seamus Waldron, SedaSoft Ltd.

AI Architecture Cognitive Systems Production

SiteEngine AI: A Multi-Tenant Retrieval-Augmented Generation (RAG) Platform with Cognitive and Emotional AI Integration

Seamus Waldron · SedaSoft Ltd. · February 2026

A comprehensive technical thesis documenting the design, implementation, and empirical evaluation of SiteEngine AI - a production-grade, multi-tenant retrieval-augmented generation platform built in approximately 60,000 lines of Go. The thesis addresses a central problem in deployed AI systems: most RAG pipelines call large language models at every stage of the pipeline, regardless of whether language generation is actually required. This creates what the thesis terms token debt - cumulative, unnecessary inference cost that compounds across queries.

The platform's architecture delegates seven of eight pipeline stages to local neural networks - ONNX-compiled models running on-device - reserving LLM calls exclusively for the final generation step. The result is a 30-50% reduction in token consumption and a 75% reduction in cloud API calls compared to conventional RAG implementations.

Beyond efficiency, the thesis documents the platform's cognitive architecture: an AI character system grounded in the PAD emotional model (Pleasure, Arousal, Dominance), memory systems informed by Ebbinghaus forgetting curves, and an Adaptive Relationship Theory implementation - believed to be the first production deployment of Communication Accommodation Theory in an AI system. The platform also incorporates a knowledge graph (Dgraph-based), multi-modal input handling, and a compliance architecture supporting GDPR and the EU AI Act.

Key contributions

  • First documented production implementation of Communication Accommodation Theory in an AI character system
  • Token debt framework: 30-50% token reduction, 75% fewer LLM API calls
  • Ebbinghaus-informed cognitive memory with time-decay modelling
  • 17-stage hybrid pipeline: local neural networks + selective cloud LLM
  • PAD-model emotional AI with fairness monitoring across relationship categories
Read the thesis
Efficiency Carbon Accounting EU AI Act

The Efficiency Engine: A Cross-System Architecture for Cost-Aware, Carbon-Reduced AI Infrastructure

Seamus Waldron · SedaSoft Ltd. · March 2026

A technical thesis on the architecture, implementation, and empirical evaluation of the Efficiency Engine - a cross-system layer that manages token consumption, processing costs, and carbon output across the entire SiteEngine AI platform. The thesis introduces a novel framing: existing AI infrastructure benchmarking measures energy consumption at the model level. What matters for policy, procurement, and planetary accounting is energy at the application layer - where architectural decisions multiply or eliminate the model-level efficiency gains.

The Efficiency Engine documents the first known production-grade, per-query carbon accounting framework for AI inference at the application layer. Using the Jegham et al. (2025) model-level benchmark - which identified Claude 3.7 Sonnet as the most eco-efficient frontier model - as a foundation, the thesis extends that methodology to measure how deployment architecture decisions compound or erode that efficiency at the point of use.

The thesis also documents the first known implementation of self-regulating AI health gating in a production RAG system: a mechanism that dynamically adjusts processing routes in response to budget constraints, latency signals, and carbon targets - without human intervention.

Key contributions

  • First production per-query carbon accounting framework at the application layer
  • First self-regulating AI health gating in a production RAG system
  • 62% average token reduction vs standard RAG baseline
  • Methodology aligned with EU AI Act energy reporting requirements (August 2026)
  • Extends Jegham et al. (arXiv:2505.09598) model-level benchmarks to the deployment layer
Read the thesis
Document Processing RAG Pipeline Benchmarked

Baibelfish: A Multi-Format Document Pre-Processing System for Retrieval-Augmented Generation

Seamus Waldron · SedaSoft Ltd. · February 2026

A 70,000-word technical thesis on the design and evaluation of Baibelfish - an intelligent document transformation system that serves as the ingestion layer for SiteEngine AI. The central problem the thesis addresses is that conventional RAG systems treat document ingestion as a preprocessing step: content is chunked by size or by simple delimiter rules, without any semantic understanding of the document's structure, hierarchy, or internal relationships.

Baibelfish introduces content-aware chunking: a set of extraction strategies that adapt to document type, structural signals, and semantic density - producing chunks optimised for retrieval accuracy rather than processing convenience. The system processes 12 input formats, including PDF, HTML, DOCX, CSV, and structured data files, applying format-appropriate extraction pipelines to each.

The thesis presents empirical evaluation against five public benchmark datasets - HotpotQA, SQuAD, Natural Questions, FinQA, and MultiFieldQA - with comparative analysis against LangChain, LlamaIndex, and Unstructured.io. The evaluation framework measures retrieval precision, answer accuracy, and knowledge discovery performance across each dataset.

Key contributions

  • Content-aware chunking across 12 document formats with format-specific extraction pipelines
  • Benchmarked against HotpotQA, SQuAD, NQ, FinQA, and MultiFieldQA
  • Comparative evaluation against LangChain, LlamaIndex, and Unstructured.io
  • Knowledge discovery architecture: entity extraction, cross-document relationship mapping
Read the thesis
Staging Architecture RAG Pipeline Benchmarked

DeepThought: An Atomic Document Staging and Promotion Architecture for Production RAG Systems

Seamus Waldron · SedaSoft Ltd. · March 2026

A technical thesis introducing the staging-and-promotion paradigm for production RAG systems. Conventional ingestion architectures move documents from raw input to retrieval index in a single, largely undifferentiated pass. DeepThought proposes an alternative: a multi-stage pipeline in which document content is progressively enriched, verified, and promoted through distinct processing stages before reaching the retrieval layer.

Each document exists at a defined stage - raw, parsed, chunked, enriched, validated, or expert-promoted - and transitions between stages are transactional. This enables rollback, audit trails, and selective reprocessing without corrupting the retrieval index. The thesis documents hybrid entity extraction across named entities, semantic relationships, and domain-specific taxonomies, with expert promotion logic that identifies documents of particularly high retrieval value.

Empirical evaluation covers four benchmark corpora totalling more than 35,000 document sections. The evaluation measures retrieval precision and recall at each pipeline stage, the impact of expert promotion on answer quality, and the performance overhead of transactional staging relative to single-pass ingestion.

Key contributions

  • Staging-promotion paradigm: first documented transactional RAG ingestion architecture
  • Empirical evaluation across four benchmark corpora (35,000+ document sections)
  • Hybrid entity extraction: named entities, semantic relationships, domain taxonomies
  • Expert promotion logic with measurable retrieval quality uplift
Read the thesis
Historical Data Architecture In Production Since 1998

SiteEngine: A Declarative Information Management Architecture

Seamus Waldron & David Bosdet · SedaSoft Ltd. · First built 1998

A historical and technical account of SiteEngine - a declarative information management platform in continuous production since 1998. The thesis documents an architecture that anticipated several of the paradigms now considered standard in modern information management systems, including declarative page configuration, page inheritance, three-level processing, and content-type abstraction - most by more than a decade. Still running on the same architecture principles 27 years on.

The thesis is, in part, a record of what is possible when a system is built without the pressure to ship a minimal version and move on. SiteEngine was not a prototype. It was built once, properly, and has been running without major architectural changes - but with continuous improvements to its codebase - since a time when Internet Explorer was still a competitive browser.

The same architecture that served the early web still serves production sites today, with management-issues.com - a management and leadership publication serving 7,200+ archived articles - celebrating its 25th year in 2026.

With an entirely new codebase written in Go, SiteEngine is fully integrated with our AI tools, demonstrating that declarative architecture can adapt to new requirements whilst preserving accumulated value.

Architecture milestones

  • Declarative page configuration - adopted broadly by modern frameworks circa 2010-2015
  • Page inheritance model - predated WordPress template hierarchy by approximately a decade
  • Three-level processing architecture - separation of content, presentation and behaviour
  • In continuous production since 1998 - no major architectural changes in 27 years
Read the thesis

About this work

These theses were not written to support a funding round or to create the impression of academic credibility. They were written because we won't start to build something without understanding it thoroughly - and writing a proper account of it is part of that process.

The metrics cited were gathered in production, against real workloads, with real users. The benchmarks are external and reproducible. The claims are empirically supported or explicitly qualified.

If you are a researcher, an academic institution, or an organisation working on adjacent problems and you find this work interesting, we would welcome a conversation.

Track record

27 yrs

Production architecture

60k+

Lines of Go code documented

35k+

Document sections benchmarked

If this research interests you, let's talk.

We are always interested in exploring research collaboration, joint publication and conversations with organisations working on similar problems.

Start a conversation