AI Assistant Internals

This page describes the internals of InterGen, the local AI assistant built into InterGenOS, and InterGen Sentinel, the security layer that guards everything InterGen touches. For the user-facing “what it does and how to set it up” view, start with the FAQ.

InterGen runs entirely on the local machine, with no cloud dependency by default. It is a conversational interface for system administration, configuration, and coding. The design follows the same posture as the rest of the system: a machine you understand, can modify, and can trust.

Design principles

Local-first. Inference runs on local hardware by default, so your data stays on your machine.
Hardware-tiered. InterGen detects the host’s RAM and GPU and selects an appropriately sized model automatically.
Security is only. Every proposed action passes through a safety classifier, and destructive or security-bypassing operations are refused outright.
Predictable routing. Requests flow through a deterministic, priority-ordered chain that resolves common queries cheaply instead of handing every interaction to the model.

InterGen ships in the ai tier of the build, the smallest of the system’s six package tiers. (The build assembles roughly 857 packages across toolchain, core, base, desktop, extra, and ai as of June 2026; these counts drift over time.)

Model catalog and hardware tiers

InterGen scales to the hardware it detects. intergen/hardware.py probes RAM and GPU and assigns a tier; intergen/model_manager.py holds the canonical model catalog and downloads, verifies, and selects the model for that tier; intergen/llama_manager.py manages the llama-server subprocess (from llama.cpp) that serves the selected model over a local HTTP API.

The shipping tiers, all built on Qwen models:

Tier	Model	Approx. size	Selected when	Role
1 — Basic	Qwen3.5-2B Q4_K_M	~1.5 GB	under 8 GB RAM	semantic matching, system queries, keyword extraction; not complex code generation
2 — Standard	Qwen3.5-9B Q4_K_M	~5.5 GB	8 to 15 GB RAM	the default daily driver: coding, configuration, reasoning
3 — Advanced	Qwen3.5-35B-A3B Q4_K_M (MoE)	~21 GB	16 GB+ RAM with a discrete GPU	deep multi-file analysis, complex architectural reasoning

On a Tier 2 machine without a discrete GPU, InterGen falls back to the 2B model to keep response latency usable. A small embedding model (nomic-embed-text, Apache-2.0) ships alongside every tier to power the router’s semantic-matching layer. The assistant collects zero telemetry.

The priority router

intergen/router.py is a priority-ordered routing chain. Rather than sending every prompt straight to the model, the router tries to satisfy each request with the cheapest, most predictable method first and escalates only when it has to.

Priority 0 — Decomposition. Detects compound requests (“update the system, then restart the web server”), splits the prompt into sub-tasks, and routes each one in turn.
Priority 1 — Keyword/regex match. Fast pattern matches for common system commands (“what’s my IP?”, “check disk space”) dispatch directly to a built-in tool without invoking the model.
Priority 2 — Semantic embedding match. Lightweight embedding search against a pre-computed catalog of capabilities; a high-confidence match dispatches to the corresponding tool.
Priority 3 — LLM tool calling. For Tier 2 and above, the query goes to the model with a schema of available system tools. The model selects a tool and arguments; the router executes the call and synthesizes the result.
Priority 4 — LLM free response. The fallback: the model answers conversationally from its own knowledge and the conversation context.

Safety classification

Every action the router proposes passes through the classifier in intergen/safety.py, which sorts the operation into one of three tiers:

AUTO — Read-only or harmless operations (ls, grep, systemctl status). Executed immediately, without a prompt.
CONFIRM — State-changing operations (systemctl restart, pkm install, editing a config file). The assistant pauses and asks you to approve before it runs.
BLOCKED — Destructive or security-bypassing operations (rm -rf /, reformatting the root partition). Refused outright, with an explanation.

This classification is what keeps the user in control: nothing that changes the system runs without explicit approval, and the most dangerous commands cannot run at all.

Desktop and tool integration

InterGen exposes its capabilities to the GNOME 49 / Wayland desktop through a D-Bus service (intergen/dbus_daemon.py), so other applications can request completion, summarization, or semantic search over IPC. The D-Bus surface is deliberately narrow: only a small set of vetted interfaces is exposed, which prevents a local unprivileged application from driving arbitrary code execution through dbus-send.

InterGen is also a Model Context Protocol (MCP) client (intergen/mcp_client.py). It can connect to local MCP servers to acquire new capabilities or query data sources, while preserving the boundary between the assistant’s core runtime and the tool-execution environment.

Conversational context is split between intergen/memory.py — a user-controlled store of persistent facts plus a rolling window of recent turns — and intergen/state_cache.py, which caches recent query results so identical prompts are not recomputed. Facts are added and removed by explicit request, stored data is fully inspectable, and all of it is serialized and persisted locally.

InterGen Sentinel — the security layer

InterGen Sentinel is the security scanner that guards InterGen’s interactions with the outside world. It defends both the user and the machine whenever the assistant crosses a trust boundary: an MCP server, a web fetch, a file write, or an outbound request to a frontier model. Sentinel has four cooperating parts.

1. A pluggable scanner engine

The scanner (intergen/scanner/) inspects one piece of content travelling one direction across a trust boundary and returns a verdict. The core interface lives in intergen/interfaces/scanner.py:

ScanDirection — EGRESS (arguments leaving the machine; exfiltration risk) or INGRESS (content arriving before it re-enters the model context; injection risk).
ScanDisposition — ALLOW / FLAG / BLOCK, severity-ordered. FLAG means “suspicious, hold for human review”; BLOCK means “high-confidence malicious, hard refuse.” When multiple verdicts merge, the most severe wins (default-deny).
ScanContext and ScanVerdict carry the surface, direction, and tool name alongside a reason, confidence score, and originating scanner.

ScannerPolicy (intergen/scanner/policy.py) composes an always-on deterministic floor with an optional deeper scanner:

LocalRulesScanner always runs first as the deterministic floor.
A BLOCK from the floor short-circuits without spending the deeper scan.
ALLOW at baseline depth passes through.
A FLAG, or any scan configured at deep depth, escalates to the configured deeper scanner.
Most-severe disposition wins. A scanner that errors fails closed to FLAG — never a silent ALLOW.

The three scanners:

Scanner	Local?	Role
`LocalRulesScanner`	yes	always-on deterministic floor: pattern/heuristic rules, no model, no network
`LocalQwenScanner`	yes	on-device llama.cpp Qwen classifier; the default deeper scanner, still no network
`CloudScanner`	no	opt-in deep tier wrapping a vendor-neutral cloud adapter; off unless you opt in

The default configuration scans entirely on-device with the rules floor plus the local Qwen classifier. The deep cloud tier is opt-in and backed by one of six providers: Claude (Anthropic), Gemini (Google), Copilot (Microsoft), ChatGPT (OpenAI), Grok (xAI), or DeepSeek. No cloud provider is configured by default.

2. A single dispatch chokepoint

Every tool call and MCP interaction is dispatched through ToolRegistry.execute (intergen/tool_registry.py), where Sentinel wires in the ScannerPolicy. Scanning is therefore structural, not per-call opt-in: arguments are egress-scanned before they leave toward a surface, returned content is ingress-scanned before it re-enters the model context, a BLOCK refuses or withholds, and a FLAG is raised to a human review modal. The scan composes with the existing provenance gate and spotlight at the same chokepoint, so three defenses layer on the one path everything must traverse.

Turning scanning off requires the human-authenticated path, and the scan policy sits in the protected configuration set that the assistant itself can never edit.

3. Phone-A-Friend (Frontier/Cloud Escalation)

EscalationManager (intergen/escalation.py) implements Phone-A-Friend (Frontier/Cloud Escalation), the consent-first path for handing a request to a more capable frontier model when the local assistant cannot satisfy it. It is distinct from the quality fallback in llm.py (which auto-escalates after the local model fails twice): Phone-A-Friend recognizes that a task is multi-step, sensitive, or slightly outside scope and offers to reach your configured frontier model, sending nothing without explicit consent.

Modes (config escalation.mode, default ASK): NEVER (offline; never offer or send) · FALLBACK (auto-escalate only on local quality-gate failure) · ASK (offer on recognition; consent before any send) · AUTO (decide by confidence, no prompt).
Hybrid recognition. A heuristic considers local confidence, multi-step signals, and query type; a user-invoked affordance (a GUI button with CLI parity) bypasses the heuristic, since the user already asked.
Show-before-send. The consent modal (intergen/consent_modal.py) displays the full outbound payload before any send, so consent is informed.
Scan-on-derivation. The initial egress you explicitly authorized is trusted at source; every subsequent derived egress is egress-scanned through the same ScannerPolicy, so a BLOCK keeps secrets from shipping to the cloud.
No default provider. With none configured, escalation cannot run and offers degrade to a “configure a provider” note. InterGen ships local-only and ready.

The same six providers are available here, with API keys stored in the system keyring rather than in plain configuration.

4. The destructive-policy never-list

An OpenPGP-signed manifest (intergen/data/destructive-policy-manifest.json, installed read-only under dm-verity) enumerates the paths InterGen’s AI may never perform a destructive operation on. There is no config option to widen it, ever: this is anti-self-tamper plus system-survival and credential/boot integrity. Anything not on the list and not dm-verity read-only is fair game under per-capability opt-in and per-action human consent. The manifest’s system_ai category also covers InterGen’s own config and state directories, so the same enforcement protects the Sentinel, escalation, and provider settings the assistant cannot edit.

Three composing pieces:

The pure matcher (intergen/destructive_policy.py) — given a loaded manifest, decides whether a candidate path is protected. No I/O, no signature logic.
The signature-verifying loader — verifies the detached signature against the operator key over the exact bytes read (closing a verify-then-parse race) before trusting the JSON. A manifest that does not verify to the pinned operator key is not trusted; the loader fails closed to an interim floor.
The chokepoint enforcement — consults the matcher inside the file-write classifier, with a canonicalized immutable-prefix floor as defense-in-depth.

Match semantics resolve symlinks and .. before comparing, so a symlink or path-traversal detour cannot smuggle a write past a prefix entry, and a candidate that cannot be normalized is treated as protected (fail closed).

The vendor-neutral cloud substrate

intergen/cloud/ is the raw-HTTP substrate the CloudScanner and Phone-A-Friend both ride on. It is vendor-neutral and SDK-free: six built-in providers (anthropic, openai, google, microsoft, deepseek, xai) plus a custom adapter for any OpenAI-compatible endpoint, all sharing a raw-HTTPS base with no vendor SDKs (a supply-chain posture that bans third-party packages from the package index). API keys are read from the keyring per call and never cached in process, and a request over a non-TLS transport is refused.

Code map

Concern	Location
Hardware probe and tier selection	`intergen/hardware.py`
Model catalog, download, verify	`intergen/model_manager.py`
llama-server lifecycle	`intergen/llama_manager.py`
Priority router	`intergen/router.py`
Safety classifier	`intergen/safety.py`
D-Bus service	`intergen/dbus_daemon.py`
MCP client	`intergen/mcp_client.py`
Memory and state	`intergen/memory.py`, `intergen/state_cache.py`
Scanner interface (ABC, types)	`intergen/interfaces/scanner.py`
Scanner engine (3 scanners + policy)	`intergen/scanner/`
Chokepoint scan-wiring	`intergen/tool_registry.py`
Phone-A-Friend escalation	`intergen/escalation.py`
Consent modal	`intergen/consent_modal.py`
Cloud substrate	`intergen/cloud/`
Destructive-policy matcher + loader	`intergen/destructive_policy.py`
Signed never-list manifest	`intergen/data/destructive-policy-manifest.json`

Keyboard shortcuts

InterGenOS Wiki