ROCm-from-Source Build Pipeline

This is the contributor-facing companion to the user page ROCm for AMD. It describes how automated ROCm-from-source enablement is meant to be built and maintained so that it fits the rest of the distribution. It documents intent and the design constraints, not commands that do not yet exist. Where a concrete version, package name, or build flag would be required, it is flagged rather than invented.

Status (1.0-dev, build id v1.0-dev1). The AI tier that ships today is CPU-only: the local inference engine is compiled without a GPU compute backend, with an explicit instruction-set floor (AVX2-class) and non-native codegen so the binary runs on every supported target rather than only on the build host. A from-source ROCm/HIP compute backend is not part of the shipped tree today. What ships for AMD hardware today is the graphics and diagnostics stack: the amdgpu kernel driver and AMD firmware in the base image, Mesa’s radeonsi and radv drivers, and an amdgpu diagnostics meta-package in the extra tier. Everything below describes how the compute pipeline is intended to be built when it lands.

Why this follows the same rules as everything else

InterGenOS is built from source, with pinned, checksum-verified inputs and a deterministic build. A GPU compute stack is one of the largest and most interdependent things a distribution can compile, and it is exactly the class of component where a clean compile can still be wrong at runtime: a backend that links but selects a device the hardware cannot actually drive, a runtime library omitted from a package’s file list, or a hardware-detection heuristic that picks an accelerator path on a GPU that does not support it. None of those is a compile error. Each shows up only when the artifact is installed and the engine is run on real hardware.

So a ROCm-from-source pipeline is held to the same doctrine as the rest of the build: it is not trusted on the strength of a clean compile. It earns trust by being installed and exercised on real AMD hardware, with the detection path validated on the lower-end tier that actually reproduces the awkward cases (integrated GPUs, older compute architectures). When a compute component fails to build, the cause is fixed in the source tree, not worked around: no disabling a feature to dodge a missing dependency, no stub that pretends to work, no package quietly moved to another tier to sidestep a wiring problem. See Build from Source and Contributing to InterGenOS for the broader process.

Where it sits in the lifecycle

InterGenOS is produced as a fixed, ordered build of 20 phases: validate, verify-sources, setup, toolchain, chroot-prep, chroot-tools, core, config, core-extra, base, kernel, desktop, ai, extra, bootloader, image, manifest, squashfs, ukis-verity, iso (with an optional publish step). GPU compute support belongs in the ai tier, alongside the local inference engine it accelerates. That placement matters in two ways: the compute stack must be present before the engine that links against it is built, and both must be in the read-only system image before the squashfs and ukis-verity phases seal it. A backend slipped in after the image is sealed would not be covered by the signed integrity chain.

What a from-source ROCm pipeline has to do

A correct pipeline is described by the requirements it must satisfy:

Pinned, verified inputs. Every source component is pinned to an immutable revision and checksum-verified before use, the same as any other package. Compute stacks pull many interlocking repositories; each one is pinned independently, and a build number a tool reports is never substituted for the revision that actually produced it.
Built into the system image, then sealed. The runtime libraries, the device-support files, and anything the engine loads at runtime are installed before squashfs/ukis-verity, so they are covered by the same verified- integrity chain as the rest of the system.
Runtime libraries asserted, not assumed. The pipeline asserts that every shared library the engine dynamically links against is actually installed and in the right place. The present-but-unloadable defect class — the binary exists, its libraries do not — passes a binary-only check and fails at first run, so the package’s verify paths assert the libraries land too. (The shipped CPU engine already does this, asserting its own libllama/libggml shared objects.)
Honest hardware detection. The engine selects a GPU compute path only on hardware that can run it, and falls back to the CPU build otherwise. This is validated on the hardware that reproduces the hard cases, not asserted from a capable development box.
No silent degradation. If the GPU path is unavailable, the system says so and runs on CPU; it does not quietly ship a broken accelerator path or mask the failure. A masked failure is exactly what the security posture forbids.

The specific source repositories, revisions, build flags, supported compute architectures, the compute package name(s), and its verify paths are intentionally not listed here, because they are not in the shipped tree.

How package counts and tiers are read

Tier contents and package counts drift between builds. As of 2026-06-15 the tree spans six tiers (toolchain, core, base, desktop, ai, extra); the ai tier is small today, and the total across all tiers is in the high hundreds. Derive the live counts from the package tree rather than trusting a fixed number, and check the release notes for the authoritative state.

How it is maintained

When the pipeline lands, it is maintained the way the rest of the build is. A fix to a compute component is made in the source tree, never as a manual edit on a running target, and it is not done until a clean from-scratch build reproduces the corrected behavior with no manual steps. A rebuilt package can be slipstreamed onto an already-booted image for fast diagnosis, but that surgical edit must be saved to the tree and prove itself on the next from-scratch build before it counts. Because GPU and graphics paths are timing- and firmware-sensitive, detection and first-run behavior are cleared over several consecutive cold boots on representative hardware, not a single warm reboot.

How this relates to the local assistant

The accelerator exists to serve the on-device assistant. InterGen is the tiered, hardware-detected, offline-first local assistant; it runs entirely on the machine, sizes the model (Qwen) it loads to the hardware it finds, and reports no telemetry. A from-source GPU compute backend is what would let InterGen run larger models faster on capable AMD hardware while still falling back cleanly to the shipped CPU engine everywhere else. The opt-in frontier-model escalation path, Phone-A-Friend (Frontier/Cloud Escalation), is unrelated to local acceleration; nothing about a GPU backend changes the offline-first, telemetry- free posture of the local assistant. The same holds for the security scanner, InterGen Sentinel: its default backends are local (Local-Rules and Local-Qwen), with cloud providers as an explicit opt-in. Local acceleration serves the local default; it introduces no network dependency.

Principles that govern this work

A machine you understand, can modify, and can trust is the goal, and a GPU compute stack — large, opaque, and easy to ship in a degraded state — is precisely where that is tested. The pipeline is built from source, sealed inside the integrity chain, verified on real hardware, and honest about when the accelerator is and is not available. Until those properties hold on a clean build, the GPU path does not ship, and this page does not claim it has.

InterGenOS Wiki