GPU Compute & AI Workloads

This section covers how InterGenOS uses the GPU: the graphics backends it can target, the compute stacks used for machine-learning work, and how the built-in AI assistant detects and scales to the hardware it finds on your machine.

InterGenOS is built from source and aligned on a single principle. That posture shapes the GPU stack the same way it shapes the rest of the system: you get a machine you understand, can modify, and can trust, with no opaque vendor blobs running outside your knowledge or consent.

InterGenOS is at version 1.0-dev (build id v1.0-dev1). The pages below describe the shipping system. Where a backend, command, or package is not yet documented in detail, the page says so plainly rather than guessing.

What ships today

The desktop is GNOME 49 on Wayland. GPU acceleration for the desktop session runs through that stack.

For compute and AI work, the system pairs a vendor-neutral default with opt-in vendor stacks. The individual pages in this section document each backend; the high-level shape is:

A vendor-neutral default backend that works across GPUs without committing you to a single vendor’s toolchain. See Vulkan (default, vendor-neutral).
AMD compute via a from-source ROCm path. See ROCm for AMD (automated from-source).
NVIDIA compute via an opt-in CUDA path. See CUDA for NVIDIA (opt-in mirror).

For the per-vendor specifics — which driver lands, how it is selected, and the exact commands — read:

How the GPU feeds the AI assistant

The clearest consumer of GPU compute in InterGenOS is InterGen, the local AI assistant. InterGen is offline-first and runs inference on local hardware by default, so your data stays on your machine. It carries zero telemetry.

InterGen is hardware-tiered. It probes the host’s RAM and GPU and selects an appropriately sized model automatically, then serves it locally over an HTTP API backed by llama.cpp. The shipping tiers use Qwen models (the qwen35 model architecture is the one InterGen’s bundled llama.cpp is built to load).

The detector reports the tier level, the detected RAM and GPU, and the recommended model and quantisation for that tier. You can see the result for your own machine with intergen tier. The tiers are:

Tier	Hardware	Model
Tier 1	< 8 GB RAM, no or integrated GPU	Qwen3.5-2B (Q4_K_M, ~1.5 GB)
Tier 2	8–15 GB RAM with a GPU	Qwen3.5-9B (Q4_K_M, ~5.5 GB)
Tier 3	16 GB+ RAM with a discrete GPU	Qwen3.5-35B-A3B MoE (Q4_K_M, ~21 GB)

InterGen’s bundled llama.cpp is a CPU-only build (an AVX2 instruction-set floor, no GPU offload), so the model that actually runs is governed by what the CPU can handle at a usable response latency, not by GPU acceleration. A machine without a discrete GPU — including most integrated-graphics laptops — is served the Tier 1 (2B) model even when its RAM would otherwise place it in Tier 2, because the 9B is too slow to run on CPU alone. The exception is a sufficiently powerful CPU (for example, a modern high-core-count or AI-class processor), which can carry the larger model without offload.

A small embedding model (nomic-ai/nomic-embed-text-v1.5) ships alongside the assistant to power its semantic-matching layer.

The full assistant architecture — the priority router, the AUTO/CONFIRM/BLOCKED safety classifier, D-Bus and MCP integration, and memory — is documented in the assistant section.

Security scanning and cloud escalation

Two related features sit alongside the local assistant. Both are local-by-default and consent-first.

InterGen Sentinel is a pluggable security scanner that inspects content crossing the device boundary: data coming in from external and MCP tools, and content about to be sent off-device. Its default configuration runs entirely on-device — a fast local-rules pass plus an optional deeper pass backed by a small local Qwen classifier. For deeper analysis you may opt in to a cloud scanner backed by one of six providers: Claude (Anthropic), Gemini (Google), Copilot (Microsoft), ChatGPT (OpenAI), Grok (xAI), or DeepSeek. None is configured by default.

Phone-A-Friend (Frontier/Cloud Escalation) is an optional path for handing a request to a more capable frontier model when the local assistant cannot satisfy it. It is off by default, asks before reaching out, stores API keys in the system keyring, and scans every outbound payload through Sentinel’s egress policy first. Cloud assistance is available when you ask for it, never imposed.

The rest of the platform

InterGenOS ships its own package manager (pkm), its own installer (Forge), a signed Secure Boot chain, dm-verity integrity, and UKI signing. The system is assembled across six package tiers (toolchain, core, base, desktop, ai, extra) through a 20-phase build pipeline. The package total runs to roughly 850 as of this writing; these counts drift as the system grows, so treat any figure as a snapshot rather than a fixed total.

For installation and the GPU-relevant steps of setup, see the Forge installer guide.

Where to go next

New to the GPU stack? Start with Overview & Backend Selection.
Running ML frameworks? See PyTorch / TensorFlow / JAX.
Specific card behaving oddly? Check Per-GPU Driver Notes.