Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GPU Compute & AI Workloads

This section covers how InterGenOS uses the GPU: the graphics backends it can target, the compute stacks used for machine-learning work, and how the built-in AI assistant detects and scales to the hardware it finds on your machine.

InterGenOS is built from source and aligned on a single principle. That posture shapes the GPU stack the same way it shapes the rest of the system: you get a machine you understand, can modify, and can trust, with no opaque vendor blobs running outside your knowledge or consent.

InterGenOS is at version 1.0-dev (build id v1.0-dev1). The pages below describe the shipping system. Where a backend, command, or package is not yet documented in detail, the page says so plainly rather than guessing.

What ships today

The desktop is GNOME 49 on Wayland. GPU acceleration for the desktop session runs through that stack.

For compute and AI work, the system pairs a vendor-neutral default with opt-in vendor stacks. The individual pages in this section document each backend; the high-level shape is:

For the per-vendor specifics — which driver lands, how it is selected, and the exact commands — read:

How the GPU feeds the AI assistant

The clearest consumer of GPU compute in InterGenOS is InterGen, the local AI assistant. InterGen is offline-first and runs inference on local hardware by default, so your data stays on your machine. It carries zero telemetry.

InterGen is hardware-tiered. It probes the host’s RAM and GPU and selects an appropriately sized model automatically, then serves it locally over an HTTP API backed by llama.cpp. The shipping tiers use Qwen models (the qwen35 model architecture is the one InterGen’s bundled llama.cpp is built to load).

The detector reports the tier level, the detected RAM and GPU, and the recommended model and quantisation for that tier. You can see the result for your own machine with intergen tier. The tiers are:

TierHardwareModel
Tier 1< 8 GB RAM, no or integrated GPUQwen3.5-2B (Q4_K_M, ~1.5 GB)
Tier 28–15 GB RAM with a GPUQwen3.5-9B (Q4_K_M, ~5.5 GB)
Tier 316 GB+ RAM with a discrete GPUQwen3.5-35B-A3B MoE (Q4_K_M, ~21 GB)

InterGen’s bundled llama.cpp is a CPU-only build (an AVX2 instruction-set floor, no GPU offload), so the model that actually runs is governed by what the CPU can handle at a usable response latency, not by GPU acceleration. A machine without a discrete GPU — including most integrated-graphics laptops — is served the Tier 1 (2B) model even when its RAM would otherwise place it in Tier 2, because the 9B is too slow to run on CPU alone. The exception is a sufficiently powerful CPU (for example, a modern high-core-count or AI-class processor), which can carry the larger model without offload.

A small embedding model (nomic-ai/nomic-embed-text-v1.5) ships alongside the assistant to power its semantic-matching layer.

The full assistant architecture — the priority router, the AUTO/CONFIRM/BLOCKED safety classifier, D-Bus and MCP integration, and memory — is documented in the assistant section.

Security scanning and cloud escalation

Two related features sit alongside the local assistant. Both are local-by-default and consent-first.

InterGen Sentinel is a pluggable security scanner that inspects content crossing the device boundary: data coming in from external and MCP tools, and content about to be sent off-device. Its default configuration runs entirely on-device — a fast local-rules pass plus an optional deeper pass backed by a small local Qwen classifier. For deeper analysis you may opt in to a cloud scanner backed by one of six providers: Claude (Anthropic), Gemini (Google), Copilot (Microsoft), ChatGPT (OpenAI), Grok (xAI), or DeepSeek. None is configured by default.

Phone-A-Friend (Frontier/Cloud Escalation) is an optional path for handing a request to a more capable frontier model when the local assistant cannot satisfy it. It is off by default, asks before reaching out, stores API keys in the system keyring, and scans every outbound payload through Sentinel’s egress policy first. Cloud assistance is available when you ask for it, never imposed.

The rest of the platform

InterGenOS ships its own package manager (pkm), its own installer (Forge), a signed Secure Boot chain, dm-verity integrity, and UKI signing. The system is assembled across six package tiers (toolchain, core, base, desktop, ai, extra) through a 20-phase build pipeline. The package total runs to roughly 850 as of this writing; these counts drift as the system grows, so treat any figure as a snapshot rather than a fixed total.

For installation and the GPU-relevant steps of setup, see the Forge installer guide.

Where to go next