Skip to content

AI Infrastructure

Kairos

A production-grade cognitive regulation system with LLM inference, RAG retrieval, and neural safety classification. Kairos uses Cognitive Load Theory (CLT) as an active control signal combined with a Mixture of Experts (MoE) architecture for intelligent, closed-loop orchestration — featuring ALETHEIA epistemic protection and autism-first adaptive design.

View Architecture View Landing Page
v0.46 Current Release
7,800+ Tests Passing
MoE Architecture
REST + WebSocket API

Core Pattern

System Architecture

A modular pipeline that consumes user text and produces intelligent, context-aware responses.

Kairos Frontend

Deterministic pipeline that consumes user text and produces a WorkOrder. Uses heuristics and pattern matching before any LLM is called.

Model Selection Engine

First-principles measurement-based system that assigns models to experts using deterministic probe suites (JSON adherence, coding sanity).

Local Model Runtime

ModelProvider abstraction layer supporting HuggingFace, llama.cpp (GGUF), vLLM, Ollama, and cloud APIs (Groq, OpenAI, Anthropic).

CORTEX Orchestrator

Allostatic routing that consumes WorkOrders and routes tasks through specialized Experts — with CLT headroom gating for adaptive load management.

ALETHEIA Epistemic Protection

Epistemic safeguards ensuring factual integrity, hallucination detection, and truth-preserving inference across the entire pipeline.

Autism-First Adaptive Design

Native neurodivergent support with sensory-aware content delivery, predictable interaction patterns, and configurable cognitive load thresholds.

Key Features

Production-Ready Intelligence

Enterprise-grade features for real-world deployment.

Mixture of Experts

Unlike monolithic LLMs, tasks are split into a graph. Specialists handle planning, coding, and writing with optimized model assignments.

RAG Retrieval

Retrieval-augmented generation for grounded, context-aware responses. Kairos retrieves relevant knowledge before generating — reducing hallucination.

Progressive Disclosure

Adaptive streaming of text based on real-time cognitive load metrics. Information delivered at the pace you can process.

Pluggable Runtimes

HF Transformers for dev, llama.cpp for efficiency, vLLM for serving, cloud APIs for scale. Hybrid local/cloud routing for privacy.

Model Caching

Thread-safe ModelCache with memory-aware LRU eviction. Prevents 14GB reloads and manages VRAM/RAM lifecycle.

Neural Safety Classification

Real-time safety classification with recall-optimized models. Crisis detection, content filtering, and configurable policy enforcement.

Supported Runtimes

Flexible Model Backends

Choose the right engine for your deployment scenario.

HuggingFace Transformers

Best for development and experimentation. 4-bit/8-bit quantization support.

llama.cpp (GGUF)

Maximum efficiency and broad model support. Optimized for local inference.

vLLM / Ollama

Optimized for serving and local server integration. High-throughput inference.

Cloud APIs

Groq (~500 tok/sec), OpenAI, Anthropic. Hybrid routing between local and cloud.

Installation

Get Started

Modular installation via pyproject.toml for flexible dependency management.

pip install -e ".[api]" FastAPI + Uvicorn
pip install -e ".[llamacpp]" llama.cpp backend
pip install -e ".[full]" All backends + API
pyproject.toml
[project.optional-dependencies]
core = [
    "transformers",
    "torch",
    "accelerate"
]
api = [
    "fastapi",
    "uvicorn",
    "pydantic",
    "httpx"
]
llamacpp = ["llama-cpp-python"]
vllm = ["httpx"]
full = ["kairos[core,api,llamacpp,vllm]"]

Observability

Integrated with W33KND

Kairos is instrumented with the Kairos Observer for real-time monitoring and analysis via the W33KND Unified Monitoring Console.

Router Decisions

Real-time tracking of expert selection for specific inputs

Latency Tracking

Processing time across the pipeline for bottleneck identification

Safety Recall Analysis

Safety-triggered refusal analysis with recall-optimized classification

Model Registry

Cost and performance tracking across model versions

Ecosystem

Powered by Kairos

Kairos is the cognitive regulation layer for the entire PRJCT LAZRUS product suite.

Power Your AI with Kairos

The adaptive intelligence at the heart of the PRJCT LAZRUS ecosystem. Contact us to learn more.