Packaging & Package Manager Internals
InterGenOS builds every package from source and ships the results as signed binary archives. This page documents the two halves of that pipeline: igos-build, the factory that compiles source into archives, and pkm, the package manager that installs, removes, queries, and verifies those archives on a running system. It covers the recipe model, the on-disk database, signing and integrity, and the reproducible-builds direction.
This is a developer and contributor reference. For adding a single package end to end, the gates described below mirror the maintainer workflow.
Version: 1.0-dev (build id v1.0-dev1).
Two components, one pipeline
igos-build is the factory; pkm is the consumer.
igos-buildcompiles source code in an isolated chroot and produces a.igos.tar.gzarchive per package.- During the tracking phase,
igos-buildgenerates the initial file list and content hashes for each archive. - If a build uses
direct_install(deploying straight into the chroot rather than to aDESTDIRstaging tree),igos-buildcallspkminternally to register the resulting files into the package database. - For end users,
pkmis the CLI tool that downloads and installs those.igos.tar.gzarchives from the network repository.
igos-build constructs packages from source in an isolated environment; pkm operates on the live filesystem.
The package recipe model
A package is a directory at packages/<tier>/<name>/:
packages/<tier>/<name>/
├── package.yml ← required: metadata + verify_paths
├── build.sh ← required for build_style: custom; optional for autotools/meson
├── <name>.1 ← optional manpage (some packages ship a tracked manpage in-recipe)
└── patches/ ← optional, build.sh-consumed
Tiers
InterGenOS organizes packages into six tiers. Each tier is built by exactly one builder, and a package’s tier: field determines which one. As of 2026-06-15, the live tree carries roughly 857 packages: toolchain 28, core 272, base 23, desktop 420, extra 112, and ai 2. These counts drift as packages land; derive the current numbers from the tree rather than treating any number as fixed.
| Tier | Builder | Where wired |
|---|---|---|
toolchain | bootstrap toolchain phases | build orchestrator |
core | bash static list | scripts/chroot-build-*.sh (one per phase) |
base | bash static list | scripts/chroot-build-base.sh |
desktop | Python topological sort | igos-build.py --tier desktop |
extra | Python topological sort | igos-build.py --tier extra |
ai | Python topological sort | igos-build.py --tier ai |
Pick the tier that matches the package: foundational utilities go to core or base; GUI and desktop applications go to desktop; servers, dev tooling, browsers, and system utilities go to extra; the local AI runtime goes to ai.
The two builders are mutually exclusive. The bash static-list builder (tiers core and base) enumerates packages explicitly in run_package lists inside scripts/chroot-build-<phase>.sh. The Python tier-driver (igos-build.py, tiers desktop / extra / ai) walks packages/<tier>/ at build time, computes the topological-sort closure of declared dependencies:, and builds everything reachable. A package must be reachable by exactly one builder; a recipe reachable by neither is an orphan that will never build. The orphan detector at scripts/check-builder-coverage.py is the early-warning system for that class and should report OK: all packages reachable by exactly one builder.
package.yml
The recipe metadata file declares identity, source, dependencies, and the paths the package is expected to install:
name: <name>
version: "<semver>"
release: 1
description: <one-line description>
license: <SPDX-identifier>
homepage: https://<upstream-homepage>
tier: <core|base|desktop|extra|ai>
build_style: <custom|autotools|meson|cmake|cargo|python>
source:
- url: https://<mirror-or-upstream-url>/<name>-<version>.tar.<ext>
sha256: <expected-sha256-of-tarball>
dependencies:
build: [] # build-only deps (autoconf, pkgconf, etc.)
host: [] # host-only tooling (rare)
runtime: # runtime deps (libraries linked, interpreters required)
- <dep1>
verify_paths:
- /usr/bin/<name>
- /usr/lib/lib<name>.so
- /etc/<name>/<name>.conf
The source tarball’s sha256: is pinned in the recipe; a download whose hash does not match is rejected. The discipline is mirror-first: stage the tarball on the InterGenOS source mirror or a file:/// URL inside the chroot. Networked https:// URLs are accepted but are not the default posture.
verify_paths: the install contract
verify_paths: is the package’s declaration of which files prove it landed. Each entry must be an absolute path with at least three segments (for example /usr/bin/x), and a recipe should pick two or three paths that establish identity:
- The primary binary at
/usr/bin/<name>or/usr/sbin/<name>is the strongest identity signal. - The primary library at
/usr/lib/lib<name>.so*for library-only packages. - A canonical directory under
/usr/share/<name>/,/usr/lib/<name>/,/etc/<name>/, or a firmware path for data, firmware, or config packages. - The
site_perlorsite-packagespath for Perl/Python module packages. - For the kernel,
/boot/vmlinuz-<version>plus/usr/lib/modules/<version>.
verify_paths is enforced at two gates. The pre-push hook refuses a new package.yml that declares neither verify_paths: nor pending_acquisition:. At image-build time, the pre-squashfs audit (scripts/pre-squashfs-audit.py) verifies every declared path is actually present in the chroot and halts the build if one is missing. A helper at igos-build/verify_paths_derive.py can auto-derive a fallback when a recipe omits the field, but the human-curated declaration remains the source of truth.
A package that legitimately cannot be acquired yet (blocked on an external dependency such as upstream sponsorship) replaces the verify_paths: block with pending_acquisition: "<reason>". The pre-squashfs audit skips such packages. This is reserved for blocked-on-external cases, not as a workaround for skipping the install contract.
build.sh
For build_style: custom, build.sh defines configure, build, and do_install functions; the orchestrator sources the script in the chroot’s per-package work directory and calls each in sequence. An optional post_install runs after the package is registered, for cache-rebuild work such as ldconfig or systemd presets that depends on the files already being in place.
#!/bin/bash
configure() {
set -e
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var
}
build() {
set -e
make -j"$(nproc)"
}
do_install() {
set -e
make DESTDIR="${DESTDIR}" install
}
Two recipe rules carry weight here. Stub functions are forbidden: a configure() that is only : and a do_install() that produces nothing is rejected. If the package has source, it must compile. A true metapackage with no source declares source: [] and uses do_install() only to write config files, with a header comment stating the package is intentionally meta. Patches to bundled source are applied by hand inside build.sh from the recipe’s own patches/ dir, not via a package.yml patches: key (that key is auto-applied from the source-staging directory and is for patches that ship alongside a fetched upstream tarball).
The package manager: pkm
pkm manages the state of the installed operating system: installation, removal, querying, and integrity verification of the pre-compiled .igos.tar.gz archives.
Hybrid data model
pkm uses two stores that mirror each other:
- A SQLite database at
/var/lib/igos/pkm.dbis the primary source of truth, used for fast queries and transaction management. - Text manifests at
/var/lib/igos/packages/are human-readable records generated alongside the database for inspection and transparency.
The duality keeps queries fast without giving up auditability: every database fact has a readable text counterpart you can inspect directly.
Database schema
The schema (pkm/database.py) consists of several interconnected tables:
installedtracks active and superseded package metadata:name(unique),version,tier,description,install_date,superseded_by,superseded_at.filesmaps every deployed file to its owning package:package_id(foreign key),path,is_dir,is_config,checksum(SHA-256). Indexed bypathfor fast owner lookups.dependstracks runtime package relationships.availablecaches remote repository metadata, synced viapkm sync.historyis an append-only transaction log of every operation (installs, removals, supersedes) for audit trails.config_filesspecially tracks files in/etc/to manage user modifications.
Installation pipeline
pkm/installer.py follows a strict sequence:
- Staging extraction. The archive is extracted to a temporary staging directory using hardened
tarflags (--no-same-owner,--no-same-permissions). - Manifest reading. The staged manifest is parsed for the file list,
SUPERSEDESdeclarations, and embedded file hashes. - Invariant checks.
pkmchecks for directory collisions (for example, preventing a package from replacing the/libsymlink with a directory) and verifies that any predecessors named inSUPERSEDESexist and are correctly ordered in the install queue. - Filesystem deploy. The archive is deployed to the root filesystem with
tar --no-overwrite-dir --keep-directory-symlink. The hardened flags drop setuid/setgid bits, which are then explicitly restored by parsing thetarfileheader metadata. This deploy is the point of no return: a successful write commits the package to disk before the database transaction that records it. - Atomic DB transaction. A single
BEGIN/COMMITblock creates theinstalledrecord, registersfileswith their content hashes, transfers overlapping file ownership for anySUPERSEDESpredecessor (marking itsuperseded_by), and appends ahistoryentry. - Text manifest generation. The text manifest at
/var/lib/igos/packages/{name}-{version}is written reflecting the final database state.
Removal
pkm/remover.py processes files cautiously. The file list is read from the database; configuration files under /etc/ are skipped to preserve user data unless --force is supplied. Regular files are unlinked. If a file’s on-disk hash differs from the database hash (it was modified post-install), pkm warns but proceeds. Directories are removed only when they become completely empty. The installed, files, and depends records are then deleted.
Upgrades: the supersede model
pkm handles upgrades through supersede semantics. When package B supersedes package A:
- Files present in both
AandBare overwritten on disk byB; in the database, ownership transfers fromAtoBwith updated SHA-256 hashes. - Files that existed in
Abut notBare left in place, still owned byA’s historical record. A’sinstalledrecord receives asuperseded_bytimestamp pointing toB.
This supports fine-grained package splits (for example util-linux splitting into util-linux-core and util-linux-extra) without orphaning files or breaking dependencies.
Integrity and security
InterGenOS treats verifiability as a core property, not an add-on. The package layer contributes three checks.
Content-hash verification
pkm verify recalculates the SHA-256 hash of every installed file on disk and compares it against the expected hash in the files table. This detects both accidental corruption and unauthorized modification of installed binaries.
Archive verification
Before installation, the cli.py layer exposes an --archive-trust flag. In strict or repo-only modes, the incoming archive’s SHA-256 is computed and checked against the trusted available index synced from the central repository. An archive whose hash is not in the trusted index is rejected.
Config vs. content
pkm tracks is_config state for files under /etc/, and the config_files table stores each file’s original deployed hash. When an upgrade ships a new config file, pkm attempts a safe merge, or leaves a .new file if it detects the user has modified the active configuration on disk. User configuration is preserved by default.
The signed chain around packages
The package archives sit inside a broader signed integrity chain that InterGenOS ships today: a signed Secure Boot chain, dm-verity integrity for the read-only system image, and signed Unified Kernel Images (UKIs). Content-hash verification ties each installed file back to a known-good value. Together these mean the path from “what the repository published” to “what is running on disk” is checkable at every link, not just at download time. The result is a machine you understand, can modify, and can trust.
Reproducible builds
Reproducible builds are a documented goal for InterGenOS and a security primitive: starting from the same source, the same toolchain, and the same documented build environment, two independent builders should produce byte-identical output. That property lets a third party detect silent backdoor insertion or build-environment tampering that signing alone cannot catch. It is the difference between “we signed the binary” and “the binary is independently verifiable from source.”
The current state is partial. Bit-identical output is the documented 1.0 goal but the full toolchain plumbing is still being built out. SOURCE_DATE_EPOCH is partially honored in the manifest-emission path of the build orchestrator, and the cargo-vendor pipeline (scripts/cargo-vendor-gen.sh) already produces reproducible vendor tarballs using the standard recipe:
tar --sort=name --owner=0 --group=0 --numeric-owner \
--mtime=@${SOURCE_DATE_EPOCH} --format=pax vendor/ | xz -T 1 -9
The .igos.tar.gz archive emitter (scripts/emit-package-archives.py) does not yet apply the same normalization. Python’s tarfile defaults embed source mtimes, the builder’s uid/gid, and a gzip header timestamp, all of which vary between builders. The planned fix normalizes each tar member (uid/gid zeroed, owner/group root, mtime pinned to SOURCE_DATE_EPOCH) and writes the gzip layer separately with a zero timestamp. Other identified gaps include exporting SOURCE_DATE_EPOCH into the chroot so compilers honor it, build-path prefix mapping (-ffile-prefix-map / --remap-path-prefix) to keep absolute chroot paths out of binaries, and pinning LC_ALL=C plus TZ=UTC.
The 1.x target is byte-identical .igos.tar.gz per package across two builders, plus an audit harness (scripts/verify-reproducible.sh, planned) that any contributor can run to confirm the property for a given package. Bootstrap-tier (toolchain) reproducibility and full ISO determinism are scoped beyond 1.x. None of the reproducibility plumbing beyond what is described as current should be read as shipping today.
See also
- FORGE Internals — the installer that lays the built image onto a machine.
- Security Review — the signed-chain and integrity posture in depth.
- FORGE Installer Guide — installing the built image.