nixlab

A NixOS homelab template that stands up a k3s cluster with WireGuard VPN, Nextcloud, Pi-hole, and automatic TLS - all configured from a single file.

What it is

nixlab is an opinionated NixOS flake for running a multi-node k3s homelab. The entire cluster is configured from one file (vars.nix). Adding a node means copying a hardware config template, filling in an IP and disk, and running colmena apply. Everything else - k3s roles, service deployment, disk layout, impermanence, secrets - is derived automatically.

Every boot wipes / via a btrfs rollback in the initrd. Only /persist survives, so your nodes are always in a known-good state.

What's included

ServicePurposeExposure
MetalLBLoadBalancer IPs from your LANpool from vars.metallbPool
LonghornDistributed block storageinternal
nginx ingressHTTP/S ingress controllervars.nginxIp
Pi-holeLAN DNS + ad blockingvars.piholeIp
ExternalDNS (Pi-hole)Auto-registers local DNS from ingressLAN
ExternalDNS (Cloudflare)Auto-registers public DNSpublic
cert-managerLet's Encrypt TLS via DNS-01-
DDNSKeeps Cloudflare A record current-
WireGuardVPN with per-user Nextcloud SSOvars.wireguardIp
NextcloudSelf-hosted cloud storagenextcloud.<vars.domain>
Signal proxySignal messenger proxysignal.<vars.domain>

Design principles

One config file. vars.nix is the only file you edit. Everything the cluster needs - usernames, IPs, domains, nodes, WireGuard users - lives there. Nix propagates it everywhere.

Nodes are just hardware. Every NixOS config comes from modules/system/node.nix, which reads your node's entry in vars.nodes by hostname. A hosts/<name>/hardware-configuration.nix file is the only per-node artifact.

Immutable by default. Every boot starts from a clean btrfs subvolume. State that needs to survive goes into /persist via the impermanence module. This makes nodes predictable and easy to rebuild.

No hand-rolling YAML. Kubernetes manifests are generated in Nix from nixhelm chart definitions and nix-kube-generators. The master node's activation script applies them in dependency order via a systemd service - no external CD tool required.

Architecture

Cluster topology

nixlab runs a k3s cluster where nodes are divided into exactly one master and any number of workers. The master runs the k3s server process plus all Kubernetes workloads. Workers run k3s agent and contribute CPU, RAM, and disk (via Longhorn) to the cluster.

graph TD
    subgraph LAN["LAN (192.168.1.0/24)"]
        Router["Router / DHCP"]
        subgraph Cluster["k3s cluster"]
            Master["master node\nk3s server\nk8s-deploy service\nall Helm charts"]
            Worker1["worker node\nk3s agent\nLonghorn replica"]
            Worker2["worker node\nk3s agent\nLonghorn replica"]
        end
        subgraph MetalLB["MetalLB IP pool (192.168.1.192/26)"]
            PiholeIP["Pi-hole\n192.168.1.250"]
            WgIP["WireGuard\n192.168.1.194"]
            NginxIP["nginx ingress\n192.168.1.193"]
        end
        Client["LAN client"]
        VPNClient["VPN client\n10.0.100.x"]
    end
    Cloudflare["Cloudflare DNS\n+ ACME DNS-01"]

    Router -->|"DHCP / mDNS"| Master
    Router -->|"DHCP / mDNS"| Worker1
    Router -->|"DHCP / mDNS"| Worker2
    Client -->|"DNS"| PiholeIP
    Client -->|"HTTP/S"| NginxIP
    VPNClient -->|"WireGuard UDP 51820"| WgIP
    VPNClient -->|"Nextcloud HTTPS via caddy sidecar"| WgIP
    Master -->|"DDNS + ExternalDNS"| Cloudflare
    Cloudflare -->|"DNS-01 challenge"| Master

How vars.nix flows through the system

vars.nix is a plain Nix attribute set. It is imported at three levels:

  1. flake.nix: reads vars.nodes to build nixosConfigurations and the Colmena deployment set. Each node becomes one configuration, using node.hostname as the NixOS hostname and node.tags as Colmena tags for targeted deploys.

  2. modules/system/node.nix: receives hostname as a specialArg and looks up vars.nodes.${hostname} to set diskConfig.device, diskConfig.espSize, and the master flag for k3s.

  3. modules/system/k8s/charts.nix and all service files under k8s/services/ - import vars.nix directly and read domain, metallbPool, piholeIp, wireguardIp, nginxIp, wireguardUsers, and upstreamDns to populate Helm values and raw manifests.

graph LR
    V["vars.nix"]
    F["flake.nix\nnixosConfigurations\ncolmena"]
    N["node.nix\ndiskConfig\nk3s.master"]
    C["charts.nix\nHelm values\nraw manifests"]

    V --> F
    V --> N
    V --> C
    F --> N

Nothing else in the repo needs to be edited to configure the cluster. vars.nix is the single source of truth.

Disk layout and impermanence

Each node's disk is partitioned with Disko:

/dev/sdX (GPT)
├── boot   (1 MiB, BIOS boot)
├── ESP    (500 MiB, vfat, /boot)
└── root   (remainder, LVM PV)
    └── root_vg / root (LVM LV)
        └── btrfs
            ├── subvol /root     → /          (wiped on every boot)
            ├── subvol /persist  → /persist   (survives reboots)
            └── subvol /nix      → /nix       (Nix store, survives reboots)

During early boot, a systemd initrd service (rollback) runs before / is mounted:

  1. Mounts the raw btrfs volume.
  2. Renames the current /root subvolume to /old_roots/<timestamp>.
  3. Creates a fresh empty /root subvolume.
  4. Deletes old_roots entries older than 30 days.
  5. Unmounts and lets the normal boot continue.

The result: every boot starts from a clean slate. Files written to / during a session vanish on the next reboot.

What persists

The impermanence module (modules/system/impermanence/default.nix) bind-mounts selected paths from /persist into the fresh /:

System state:

  • /etc/nixos, /var/log, /var/lib/nixos - NixOS metadata
  • /etc/NetworkManager/system-connections, /var/lib/NetworkManager - network config
  • /var/lib/kubelet, /var/lib/rancher/k3s, /etc/rancher - k3s state
  • /var/lib/longhorn, /var/lib/csi, /var/lib/docker - storage
  • /etc/machine-id, /etc/adjtime - stable machine identity

SSH host keys live at /persist/etc/ssh/ssh_host_ed25519_key and ssh_host_rsa_key. The openssh service is configured to use those paths so host key fingerprints don't change across reboots.

User home directories: ~/Code, ~/Documents, ~/.ssh, ~/.gnupg, ~/.config, ~/.local and ~/.config/sops are persisted per-user.

SOPS age key: /persist/etc/sops-nix/keys.txt - the age private key used to decrypt secrets.yaml.

k3s module

modules/system/k3s/default.nix configures k3s based on node.master:

  • Master (master = true): runs as server with --cluster-init, --disable=servicelb, --disable=traefik, --disable=local-storage. Writes kubeconfig to /etc/rancher/k3s/k3s.yaml (mode 0644 so the cluster user can read it).
  • Agent (master = false): runs as agent and derives the master's URL from vars.nodes at build time - serverAddr = "https://${masterHostname}:6443". No manual IP configuration needed.

Both roles use config.sops.secrets.k3s_token.path as the token file and run openiscsi (required by Longhorn).

The firewall is disabled cluster-wide (mkForce false) since k3s and MetalLB handle their own iptables rules. The host firewall interferes with CNI traffic.

Kubernetes chart deployment

On the master node, a NixOS activation script (kubernetes-prepare) and a systemd service (k8s-deploy) handle all Kubernetes deployments.

Activation script (runs on every nixos-rebuild switch / colmena apply):

  1. Writes rendered manifest YAML files to /var/lib/kubernetes/manifests/.
  2. Clears deployment sentinel files (/var/lib/kubernetes/.deploy-<group>-done) for any group whose chart content changed (detected by comparing Nix store paths).
  3. Restarts the k8s-deploy service.

k8s-deploy service (oneshot, runs after k3s.service):

  1. Waits for the Kubernetes API to respond.
  2. Creates all required namespaces.
  3. Deletes all existing Jobs (to unblock re-runs).
  4. Reads SOPS secrets and creates/patches Kubernetes Secret objects for credentials that services reference.
  5. Deploys chart groups in dependency order, retrying each chart up to 3 times (configurable per group).
  6. After each group succeeds, writes a sentinel file and waits for the declared readiness conditions before moving to the next group.
  7. Runs the Nextcloud SSO setup script.

Deployment groups, in order:

GroupChartsWaits for
core-infrastructurelonghorn, metallblonghorn-driver-deployer, metallb-controller
core-configmetallb-config- (retries 5×)
networking-servicesingress-nginx, piholeingress-nginx-controller
dns-servicesexternaldns-piholeexternal-dns
external-accesscert-managercert-manager deployment
external-dnsexternaldns-cloudflare, cert-manager-issuers, cloudflare-ddnsexternal-dns
external-ingresspihole-external-ingress-
vpn-serviceswireguard-config, wireguard-caddy-cert, wireguard-storage, wireguard-deployment, wireguard-servicewireguard deployment
appssignal-proxy, nextcloudsignal-proxy, nextcloud deployments

SOPS secrets

All secrets live in modules/system/sops/secrets.yaml, encrypted with SOPS + age. The key file location on running nodes is /persist/etc/sops-nix/keys.txt. sops-nix decrypts secrets at activation time.

modules/system/sops/default.nix declares every secret and maps it to a file path that NixOS services can read. WireGuard user public keys are registered automatically by iterating over enabled entries in vars.wireguardUsers - each entry's publicKeySecret field becomes a SOPS secret name.

The Kubernetes deployment script reads decrypted secret paths from config.sops.secrets.<name>.path and pushes them into Kubernetes Secrets via kubectl patch.

Getting Started

This guide walks you through setting up nixlab from scratch: generating keys, filling in vars.nix, preparing your first node, and deploying the cluster.

Prerequisites

You need the following tools on your workstation (not on the nodes):

  • Nix with flakes enabled (experimental-features = nix-command flakes in ~/.config/nix/nix.conf)
  • age: for generating the encryption key
  • sops: for creating and editing the secrets file
  • Colmena: for deploying to nodes

Install them all with:

nix profile install nixpkgs#age nixpkgs#sops nixpkgs#colmena

Or temporarily via nix shell:

nix shell nixpkgs#age nixpkgs#sops nixpkgs#colmena

Step 1: Clone the repo

git clone https://github.com/thatbagu/nixlab
cd nixlab

Step 2: Generate your age key

nixlab uses age for encrypting secrets. Generate a key pair:

age-keygen -o ~/.config/sops/age/keys.txt

This prints the public key to stdout. Copy it - you need it in the next step.

Now update .sops.yaml with your public key. Open it and replace the placeholder:

keys:
  - &primary age1REPLACE_WITH_YOUR_AGE_PUBLIC_KEY

Change age1REPLACE_WITH_YOUR_AGE_PUBLIC_KEY to the public key that age-keygen printed.

Step 3: Generate a cluster SSH key

All nodes use a single SSH key for cluster access:

ssh-keygen -t ed25519 -C "nixlab-cluster" -f ~/.ssh/nixlab-cluster

Note the public key:

cat ~/.ssh/nixlab-cluster.pub

Step 4: Fill in vars.nix

Open vars.nix and replace every placeholder value with your real values:

{
  username = "youruser";          # your Linux username
  timezone = "Europe/Berlin";     # timedatectl list-timezones
  clusterSshKey = "ssh-ed25519 AAAA... youruser@host";  # from step 3

  nodes = {
    master = {
      hostname = "mymaster";   # must match hosts/<hostname>/
      master   = true;
      disk     = "/dev/sda";   # check with lsblk on the target machine
      tags     = [ "homelab" "master" "mymaster" ];
    };
  };

  domain      = "yourdomain.example.com";  # Cloudflare-managed domain
  metallbPool = "192.168.1.192/26";        # outside your DHCP range
  piholeIp    = "192.168.1.250";
  wireguardIp = "192.168.1.194";
  nginxIp     = "192.168.1.193";
  upstreamDns = "192.168.1.1";            # your router

  wireguardUsers = {};  # add users later with add-wg-user.sh
}

The IPs in metallbPool, piholeIp, wireguardIp, and nginxIp must all be in the same subnet and outside your router's DHCP assignment range.

See Configuration for a full field reference.

Step 5: Create and encrypt secrets.yaml

Copy the example file:

cp modules/system/sops/secrets.yaml.example modules/system/sops/secrets.yaml

Fill in the real values. For secrets you need to generate:

# k3s cluster join token - any long random string
openssl rand -hex 32

# WireGuard server keys
wg genkey | tee /tmp/wg-server.key | wg pubkey > /tmp/wg-server.pub
cat /tmp/wg-server.key   # wireguard_server_private_key
cat /tmp/wg-server.pub   # wireguard_server_public_key

# Linux user password hash (replace 'yourpassword')
mkpasswd -m sha-512 yourpassword

For wireguard_server_endpoint: use your public IP or a DDNS hostname. The DDNS service (if enabled) will keep Cloudflare updated, but the WireGuard endpoint in secrets.yaml is what clients use to connect.

For Cloudflare credentials: create an API token at https://dash.cloudflare.com/profile/api-tokens with Zone:DNS:Edit permission. The email is your Cloudflare account email.

For private_ssh_key: this is the private key of the cluster SSH key from step 3. The full private key, including the header/footer lines.

Once secrets.yaml is filled in with real values, encrypt it:

sops --encrypt --in-place modules/system/sops/secrets.yaml

SOPS will use the age key from .sops.yaml. The encrypted file is safe to commit - commit it now:

git add modules/system/sops/secrets.yaml
git commit -m "add encrypted secrets"

Step 6: Prepare the first node's hardware config

Boot your target machine with a NixOS installer ISO. Once booted:

nixos-generate-config --no-filesystems
cat /etc/nixos/hardware-configuration.nix

The --no-filesystems flag skips filesystem detection (Disko handles that). Copy the output to your workstation:

mkdir -p hosts/mymaster
# paste the hardware-configuration.nix content here

The file should look something like:

{ config, lib, modulesPath, ... }:
{
  imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];

  boot.initrd.availableKernelModules = [ "nvme" "xhci_pci" "ahci" "usbhid" ];
  boot.initrd.kernelModules = [ ];
  boot.kernelModules = [ "kvm-intel" ];  # or kvm-amd
  boot.extraModulePackages = [ ];

  swapDevices = [ ];

  networking.useDHCP = lib.mkDefault true;
  nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
}

The hostname in vars.nodes must match the directory name under hosts/.

Step 7: Initial install

From the NixOS installer on the target machine, with the repo available (via git clone or a mounted drive):

# Install using Disko to partition the disk, then NixOS
nix run github:nix-community/disko -- --mode disko --flake .#mymaster
nixos-install --flake .#mymaster --no-root-password

Alternatively, if you already have a running NixOS system on the node (even a minimal one), you can deploy directly from your workstation:

colmena apply --on mymaster

Colmena connects via SSH (<hostname>.local using mDNS, as the targetHost in the flake) and switches the system.

Step 8: Verify

After the install reboots:

# SSH into the master
ssh -i ~/.ssh/nixlab-cluster youruser@mymaster.local

# Check k3s is running
sudo k3s kubectl get nodes

# Watch service deployment (takes a few minutes on first boot)
sudo journalctl -fu k8s-deploy

The k8s-deploy service applies all Kubernetes charts in dependency order. Once it finishes, all services should be running:

sudo k3s kubectl get pods --all-namespaces

Step 9: Add more nodes

See Adding Nodes.

Step 10: Add VPN users

See WireGuard VPN.

Subsequent deploys

After changing vars.nix or any module:

# Deploy to all nodes
colmena apply

# Deploy only to master
colmena apply --on @master

# Deploy only to workers
colmena apply --on @worker

# Deploy to a specific node
colmena apply --on mymaster

Colmena uses the tags field in vars.nodes entries to resolve @master and @worker selectors.

Configuration Reference

vars.nix is the only file you edit to configure nixlab. It is a plain Nix attribute set that the flake and all modules import directly.

This page documents every field.


Top-level fields

username

Type: string
Example: "alice"

The Linux username created on every cluster node. This user:

  • Has sudo access (NOPASSWD for all commands)
  • Can SSH in with clusterSshKey
  • Owns the SOPS-decrypted secrets that need non-root access
  • Has their home directory persisted at /persist/home/<username>/

The same username is used on all nodes. There is no per-node user configuration.


timezone

Type: string
Example: "Europe/Berlin"

The timezone for all cluster nodes. Set via time.timeZone.

Find valid values:

timedatectl list-timezones

clusterSshKey

Type: string (SSH public key)
Example: "ssh-ed25519 AAAA... user@host"

The SSH public key added to ~/.ssh/authorized_keys for username on every node. This is how Colmena connects to nodes for deployment.

Generate a dedicated cluster key:

ssh-keygen -t ed25519 -C "nixlab-cluster" -f ~/.ssh/nixlab-cluster

nodes

Type: attribute set of node definitions
Default: {} (empty - cluster won't deploy)

Each attribute in nodes defines one cluster node. The attribute name is arbitrary (used as a label in Colmena); the hostname field is what actually matters.

nodes = {
  master = {
    hostname = "mymaster";
    master   = true;
    disk     = "/dev/sda";
    tags     = [ "homelab" "master" "mymaster" ];
  };
  worker1 = {
    hostname = "worker1";
    master   = false;
    disk     = "/dev/nvme0n1";
    tags     = [ "homelab" "worker" "worker1" ];
  };
};

nodes.<name>.hostname

Type: string
Required

The NixOS hostname for this node. Must match:

  • The directory name under hosts/ containing hardware-configuration.nix
  • The hostname the machine announces via mDNS (i.e., <hostname>.local must resolve on your LAN)

The flake sets networking.hostName = hostname for each node.

nodes.<name>.master

Type: bool
Required

Set to true for exactly one node. That node runs the k3s server process and all Kubernetes workloads (via the k8s-deploy service). All other nodes are k3s agents.

The master's hostname is derived automatically from vars.nodes at build time and embedded in the agent serverAddr - no manual IP configuration needed.

nodes.<name>.disk

Type: string (device path)
Required
Example: "/dev/sda", "/dev/nvme0n1"

The block device Disko will partition. This disk will be completely wiped during installation. Verify with lsblk on the target machine before setting this value.

nodes.<name>.espSize

Type: string (size with unit)
Default: "500M"

Size of the EFI system partition. The default is sufficient for most setups. Increase if you store many NixOS generations in /boot.

nodes.<name>.tags

Type: list of strings
Required

Colmena deployment tags. Used to target groups of nodes:

colmena apply --on @master   # deploys to nodes tagged "master"
colmena apply --on @worker   # deploys to nodes tagged "worker"

Include at minimum the role tag ("master" or "worker") and the hostname. Additional tags are arbitrary.


Networking fields

domain

Type: string
Example: "home.example.com"

Your public domain, managed by Cloudflare. Used for:

  • Let's Encrypt TLS certificates (DNS-01 challenge via Cloudflare)
  • External DNS records (via ExternalDNS Cloudflare provider)
  • Nextcloud's hostname: nextcloud.<domain>
  • Signal proxy hostname: signal.<domain>
  • Pi-hole external ingress: pihole.<domain>

The domain must be in a Cloudflare-managed zone. The Cloudflare API token in secrets.yaml must have Zone:DNS:Edit permission for this zone.


metallbPool

Type: string (CIDR)
Example: "192.168.1.192/26"

The IP range MetalLB draws from when assigning LoadBalancer IPs. Must be:

  • Within your LAN subnet
  • Outside your router's DHCP range
  • Large enough to hold piholeIp, wireguardIp, nginxIp and any future services

A /26 gives 62 usable addresses, which is more than enough.


piholeIp

Type: string (IP address)
Example: "192.168.1.250"

The static IP assigned to Pi-hole's LoadBalancer service. Pi-hole serves both DNS (port 53) and the web UI on this IP.

Configure your router to hand out this IP as the DNS server for your LAN clients, or set it manually on each device.

Must be within metallbPool.


wireguardIp

Type: string (IP address)
Example: "192.168.1.194"

The static IP assigned to the WireGuard LoadBalancer service (UDP port 51820). VPN clients connect to this IP.

The caddy sidecar inside the WireGuard pod also listens on this IP for HTTPS (TCP 443) to serve Nextcloud with header-injected SSO.

Must be within metallbPool.


nginxIp

Type: string (IP address)
Example: "192.168.1.193"

The static IP assigned to the nginx ingress controller. All HTTP/S traffic for cluster services routes through this IP. ExternalDNS registers ingress hostnames pointing here.

Must be within metallbPool.


upstreamDns

Type: string (IP address)
Example: "192.168.1.1"

The upstream DNS resolver Pi-hole forwards non-blocked queries to. Typically your router's LAN IP.


wireguardUsers

Type: attribute set of user definitions
Default: {} (no VPN users)

Each attribute defines one WireGuard VPN user. Users are applied to the WireGuard server configuration and, optionally, to Nextcloud SSO.

wireguardUsers = {
  "alice" = {
    ip              = "10.0.100.2";
    group           = "admin";
    publicKeySecret = "alice_wg_public_key";
    allowedIPs      = "0.0.0.0/0";
    nextcloudUser   = "alice";
    description     = "Alice - full admin access";
    enabled         = true;
  };
};

wireguardUsers.<name>.ip

Type: string (IP address)
Example: "10.0.100.2"

The VPN IP assigned to this user. Must be unique within the 10.0.100.0/24 range. The server uses .1; users start at .2.

Use add-wg-user.sh to assign IPs automatically - it reads existing allocations from vars.nix and picks the next free one.

wireguardUsers.<name>.group

Type: string
Example: "admin", "family", "friends", "guests"

An arbitrary access group label. Not currently enforced by the system (no firewall rules are generated per group), but useful for documentation and future policy enforcement.

wireguardUsers.<name>.publicKeySecret

Type: string
Example: "alice_wg_public_key"

The name of the SOPS secret that holds this user's WireGuard public key. The secret must exist in modules/system/sops/secrets.yaml.

add-wg-user.sh creates this entry automatically.

wireguardUsers.<name>.allowedIPs

Type: string (CIDR or comma-separated CIDRs)
Example: "0.0.0.0/0", "192.168.1.0/24"

Traffic routes the client should send through the VPN tunnel. "0.0.0.0/0" routes all traffic through the VPN (full tunnel). A LAN CIDR routes only homelab traffic (split tunnel).

wireguardUsers.<name>.nextcloudUser

Type: string (optional)
Example: "alice"

If set, the caddy sidecar injects X-Remote-User: <nextcloudUser> when this VPN user (identified by their VPN IP) connects to Nextcloud over the VPN. Nextcloud trusts this header for automatic login - no password prompt when accessing from the VPN.

Omit this field for users who should not have Nextcloud SSO.

wireguardUsers.<name>.description

Type: string
Example: "Alice - full admin access"

A human-readable description. Not used by the system; for documentation only.

wireguardUsers.<name>.enabled

Type: bool

Set to false to disable a user without removing their entry. Disabled users are excluded from the WireGuard server config and their SOPS secret is not registered.

Adding Nodes

Adding a node to the cluster is a four-step process: prepare the hardware config, register the node in vars.nix, do the initial install, and deploy.

Step 1: Get the hardware configuration

Boot the target machine with a NixOS installer ISO. Once booted, generate the hardware config:

nixos-generate-config --no-filesystems

The --no-filesystems flag skips auto-detected filesystem entries - Disko generates those declaratively, so you don't want them duplicated.

Copy the generated file to your workstation. The node's hostname in vars.nix must match the directory you create here:

mkdir -p hosts/<hostname>
# copy /etc/nixos/hardware-configuration.nix to hosts/<hostname>/hardware-configuration.nix

A minimal hardware config looks like:

{ config, lib, modulesPath, ... }:
{
  imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];

  boot.initrd.availableKernelModules = [ "nvme" "xhci_pci" "ahci" "usbhid" ];
  boot.initrd.kernelModules = [ ];
  boot.kernelModules = [ "kvm-intel" ];  # or kvm-amd for AMD CPUs
  boot.extraModulePackages = [ ];

  swapDevices = [ ];

  networking.useDHCP = lib.mkDefault true;
  nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
}

Keep only what nixos-generate-config produces - the kernel modules and firmware detection. Remove any fileSystems or disko entries if they appear.

Step 2: Add the node to vars.nix

Open vars.nix and add an entry under nodes:

nodes = {
  master = { ... };  # existing master

  worker1 = {
    hostname = "worker1";      # must match hosts/<hostname>/ directory
    master   = false;          # true for exactly one node
    disk     = "/dev/sda";     # verify with lsblk on the target
    tags     = [ "homelab" "worker" "worker1" ];
  };
};

To check the disk device on the target machine while it's running the installer:

lsblk -d -o NAME,SIZE,MODEL

Pick the device you want to install to. The entire disk will be wiped.

Step 3: Initial installation

On the target machine, with the nixlab repo accessible (clone it or mount it):

cd nixlab

# Partition and format the disk with Disko
nix run github:nix-community/disko -- --mode disko --flake .#<hostname>

# Install NixOS
nixos-install --flake .#<hostname> --no-root-password

Disko reads vars.nodes.<name>.disk and creates the GPT → LVM → btrfs layout automatically.

After nixos-install completes, reboot:

reboot

Option B: Deploy from your workstation (for machines with SSH access)

If the target already has a running NixOS system with SSH access:

colmena apply --on <hostname>

Colmena connects to <hostname>.local (mDNS) as vars.username and switches the system.

Note: if the target is not yet running NixOS (e.g., it's running another distro), use Option A.

Step 4: Verify

SSH into the new node:

ssh -i ~/.ssh/nixlab-cluster <username>@<hostname>.local

Check it joined the k3s cluster:

# From the master node
sudo k3s kubectl get nodes

The new node should appear with status Ready within a minute or two of booting. k3s agent derives the master's address from the compiled-in serverAddr - no manual configuration on the worker is needed.

If the node is a worker, Longhorn will automatically pick it up for replica scheduling once the Longhorn manager pod starts on the new node.

Subsequent deploys

After the initial install, subsequent config changes are applied with:

# Deploy only to the new node
colmena apply --on <hostname>

# Or deploy to all workers at once
colmena apply --on @worker

Removing a node

  1. Drain the node in Kubernetes to migrate workloads away:

    kubectl drain <hostname> --ignore-daemonsets --delete-emptydir-data
    
  2. Delete the node from the cluster:

    kubectl delete node <hostname>
    
  3. Remove the entry from vars.nodes in vars.nix.

  4. Delete the hosts/<hostname>/ directory.

  5. Run colmena apply to update the remaining nodes (removes the node from the k3s agent token scope).

Services

All services run on the master node as Kubernetes workloads. They are deployed and managed by the k8s-deploy systemd service, which applies charts in dependency order every time the NixOS configuration changes.

Service configuration lives in modules/system/k8s/services/. Most values are driven by vars.nix. Charts are pulled from nixhelm and rendered to YAML by nix-kube-generators.


MetalLB

Namespace: metallb-system
Chart source: nixhelm / metallb
IP pool: vars.metallbPool

MetalLB provides LoadBalancer-type Services on bare metal by assigning IPs from vars.metallbPool and responding to ARP requests on your LAN. Without MetalLB, type: LoadBalancer services would stay in <Pending> state forever.

The pool is configured via a metallb-config manifest (a IPAddressPool + L2Advertisement resource pair) deployed in the core-config group after MetalLB itself is ready.

Services that get a static IP from the pool:

  • Pi-hole: vars.piholeIp
  • WireGuard: vars.wireguardIp
  • nginx ingress: vars.nginxIp

Longhorn

Namespace: longhorn-system
Chart source: nixhelm / longhorn

Longhorn provides distributed block storage across all cluster nodes. It creates replicated PersistentVolume objects for stateful services (Nextcloud, its PostgreSQL database, WireGuard data).

Every node has the Longhorn node label set at k3s startup:

--node-label=node.longhorn.io/create-default-disk=true

This tells Longhorn to use the node's default disk (under /var/lib/longhorn/) for replica storage. Longhorn data is persisted across reboots via the impermanence module (/var/lib/longhorn is bind-mounted from /persist).

Storage class name: longhorn (used by all PVC definitions in the cluster).


nginx ingress

Namespace: nginx-system
Chart source: nixhelm / ingress-nginx
IP: vars.nginxIp

The nginx ingress controller handles all external HTTP/S traffic. Ingress objects in other namespaces use ingressClassName: nginx.

TLS termination is handled by cert-manager (via cert-manager.io/cluster-issuer annotation on each Ingress). nginx passes the decrypted request upstream to the service.

The controller is assigned a static LoadBalancer IP from MetalLB (vars.nginxIp). ExternalDNS watches Ingress objects and registers their hostnames in Pi-hole (local) and Cloudflare (external).


Pi-hole

Namespace: pihole-system
Chart source: nixhelm / mojo2600/pihole
Version: configured in charts.nix (vars.versions.pihole)
IP: vars.piholeIp (shared by DNS and web UI)

Pi-hole provides LAN-wide DNS filtering and ad blocking. Both the DNS service (UDP/TCP 53) and the web UI share the same LoadBalancer IP via MetalLB's IP sharing (metallb.universe.tf/allow-shared-ip).

Upstream DNS: vars.upstreamDns (typically your router).

Custom DNS entries are injected at deploy time:

  • pihole.homevars.piholeIp (local admin UI access)
  • nextcloud.<vars.domain>10.0.100.1 - routes VPN clients to the caddy sidecar instead of the nginx ingress. This enables Nextcloud's VPN-based SSO without affecting LAN clients, who hit nginx normally.

To access the Pi-hole admin UI: http://pihole.home/admin from your LAN (configure your device's DNS to point to vars.piholeIp first, or set it on the router for all devices).


ExternalDNS (Pi-hole)

Namespace: pihole-system
Chart source: nixhelm / external-dns

Watches Ingress and Service objects and automatically registers/removes DNS entries in Pi-hole. This means any service with an Ingress gets a local DNS name without manual Pi-hole configuration.

The ExternalDNS Pi-hole provider reads the Pi-hole API to manage entries. It runs in the same namespace as Pi-hole.


ExternalDNS (Cloudflare)

Namespace: external-dns
Chart source: nixhelm / external-dns

A second ExternalDNS instance that registers public DNS records in Cloudflare. It watches the same Ingress objects but only registers hostnames that match vars.domain.

Requires the Cloudflare API token (from secrets.yaml) injected as a Kubernetes Secret.


cert-manager

Namespace: cert-manager
Chart source: nixhelm / cert-manager

Issues Let's Encrypt TLS certificates via DNS-01 challenge. DNS-01 is required for wildcard certificates and works without the cluster being publicly reachable (Cloudflare handles the challenge response).

Two ClusterIssuer resources are created:

  • letsencrypt-prod - production certificates (used by all services)
  • letsencrypt-staging - for testing without hitting rate limits

To use staging: change vars.tls.defaultIssuer in charts.nix to "letsencrypt-staging".

The Cloudflare API token and email are injected as Kubernetes Secrets from SOPS (cloudflare_token, cloudflare_email).


DDNS (Cloudflare)

Namespace: external-dns
Chart source: custom manifest

A CronJob that periodically resolves your public IP and updates a Cloudflare A record. This keeps your public DNS pointing at your home IP even if it changes.

The Cloudflare credentials are the same ones used by cert-manager.


WireGuard

Namespace: wireguard-system
Image: lscr.io/linuxserver/wireguard:latest
IP: vars.wireguardIp

The WireGuard pod runs two containers:

wireguard container: the VPN server itself. The server config (wg0.conf) is generated from a template in a ConfigMap. An init container fills in the server private key and user public keys by reading Kubernetes Secrets (which the k8s-deploy service populated from SOPS).

Caddy sidecar: Caddy terminates HTTPS on port 443 of the WireGuard pod's IP (10.0.100.1 inside the VPN). It matches incoming requests by the client's VPN IP and injects X-Remote-User: <nextcloudUser> before proxying to Nextcloud's internal service. This gives VPN users automatic login to Nextcloud without a password.

The Caddy TLS certificate is issued by cert-manager for nextcloud.<vars.domain>. Because cert-manager issues secrets in a specific namespace and Caddy runs in wireguard-system, a dedicated Certificate resource is created in wireguard-system - it cannot mount the secret from the nextcloud namespace.

Pi-hole's custom DNS routes nextcloud.<vars.domain> to 10.0.100.1 for VPN clients, so the VPN client's HTTPS request hits Caddy instead of nginx.

VPN user public keys are stored in SOPS and pushed to a Kubernetes Secret (wireguard-secrets) by the deployment script. The init container reads them at pod start and substitutes the placeholders in wg0.conf.

For managing VPN users, see WireGuard VPN.


Nextcloud

Namespace: nextcloud
Chart source: nixhelm / nextcloud
URL: https://nextcloud.<vars.domain>

Nextcloud is deployed with:

  • PostgreSQL: 8Gi Longhorn PVC
  • Redis: for session caching
  • 100Gi Longhorn PVC: for file storage
  • nginx ingress with cert-manager TLS

All credentials (admin password, DB password, Redis password) come from SOPS secrets pushed to Kubernetes Secrets before the chart deploys.

Nextcloud is configured to trust the k3s pod CIDR (10.42.0.0/16) as a proxy and to accept X-Forwarded-For headers. It also trusts the X-Remote-User header for auto-login when the request comes via the WireGuard caddy sidecar.

The Nextcloud SSO setup script (modules/system/k8s/scripts/nextcloud-sso.nix) runs after all charts are deployed to configure trusted domains and the remote user header.


Signal proxy

Namespace: signal-proxy
URL: https://signal.<vars.domain>

A Signal messenger proxy that lets Signal clients connect through your homelab instead of directly to Signal's servers. Useful for regions where Signal is blocked.

The ingress uses cert-manager TLS and the nginx ingress controller.

WireGuard VPN

nixlab includes a WireGuard VPN server running as a Kubernetes pod on the master node. It provides encrypted remote access to your homelab and optional automatic login to Nextcloud for VPN users.

How it works

The WireGuard pod runs two containers:

  • wireguard: the VPN server on UDP port 51820 (vars.wireguardIp)
  • caddy sidecar: an HTTPS proxy on TCP port 443 that injects X-Remote-User headers for Nextcloud SSO

When a VPN user connects to Nextcloud from within the tunnel, Pi-hole resolves nextcloud.<vars.domain> to 10.0.100.1 (the WireGuard server's VPN IP) instead of the nginx ingress IP. The request hits Caddy, which identifies the user by their VPN IP, injects the X-Remote-User header with their Nextcloud username, and proxies to Nextcloud. Nextcloud trusts this header and logs the user in automatically.

LAN users (not on VPN) resolve nextcloud.<vars.domain> to the nginx ingress and go through normal authentication.

VPN subnet

The VPN uses 10.0.100.0/24:

  • 10.0.100.1 - WireGuard server (caddy sidecar also listens here)
  • 10.0.100.2 and up - clients (assigned per user in vars.wireguardUsers)

DNS for VPN clients is vars.piholeIp - Pi-hole blocks ads and resolves local hostnames for VPN users the same as LAN users.

Adding a user with add-wg-user.sh

The script at modules/system/sops/add-wg-user.sh automates the full onboarding flow:

cd nixlab
bash modules/system/sops/add-wg-user.sh <username>

What it does:

  1. Reads vars.wireguardUsers via nix eval to find used IPs.
  2. Picks the next free IP in 10.0.100.0/24.
  3. Generates a WireGuard keypair with wg genkey.
  4. Stores the public and private keys in modules/system/sops/secrets.yaml via sops --set.
  5. Reads the server public key and endpoint from SOPS.
  6. Prints a ready-to-use client config and the vars.nix snippet to add.

Example output:

Client config for alice
============================
[Interface]
PrivateKey = <alice-private-key>
Address = 10.0.100.2/32
DNS = 192.168.1.250

[Peer]
PublicKey = <server-public-key>
Endpoint = <your-public-ip>:51820
AllowedIPs = 0.0.0.0/0
PersistentKeepalive = 25

Add to vars.nix wireguardUsers
============================
"alice" = {
  ip              = "10.0.100.2";
  publicKeySecret = "alice_wg_public_key";
  allowedIPs      = "0.0.0.0/0";
  enabled         = true;
};

After running the script:

  1. Paste the vars.nix snippet into wireguardUsers, adding group, description, and optionally nextcloudUser:

    wireguardUsers = {
      "alice" = {
        ip              = "10.0.100.2";
        group           = "admin";
        publicKeySecret = "alice_wg_public_key";
        allowedIPs      = "0.0.0.0/0";
        nextcloudUser   = "alice";   # optional: enables Nextcloud SSO
        description     = "Alice - full access";
        enabled         = true;
      };
    };
    
  2. Deploy to the master:

    colmena apply --on @master
    

    The activation script updates the WireGuard ConfigMap with the new peer. The k8s-deploy service restarts the WireGuard pod to apply the new config.

  3. Send the client config to the user (the block printed by the script). The private key is already embedded - the user just imports it.

Retrieving a user's private key later

The private key is stored in SOPS. To recover it:

sops --decrypt --extract '["alice_wg_private_key"]' modules/system/sops/secrets.yaml

Disabling a user

Set enabled = false in vars.nix:

"alice" = {
  ...
  enabled = false;
};

Deploy: colmena apply --on @master

The user's public key is removed from the WireGuard server config and their SOPS secret is unregistered from NixOS. The secret entry in secrets.yaml is left in place (to preserve the key if you re-enable the user).

Removing a user entirely

  1. Set enabled = false and deploy to confirm the peer is removed.
  2. Remove the user's entry from vars.wireguardUsers in vars.nix.
  3. Remove their keys from secrets.yaml:
    sops modules/system/sops/secrets.yaml
    # delete the alice_wg_public_key and alice_wg_private_key entries
    
  4. Deploy: colmena apply --on @master

Client setup

Linux (wg-quick)

# Save the config from add-wg-user.sh output as:
sudo mkdir -p /etc/wireguard
sudo nano /etc/wireguard/nixlab.conf   # paste the [Interface] + [Peer] block

# Connect
sudo wg-quick up nixlab

# Disconnect
sudo wg-quick down nixlab

# Auto-start on boot
sudo systemctl enable wg-quick@nixlab

macOS

Install the WireGuard app from the App Store. Click the + button and import the config file (save the output of add-wg-user.sh as a .conf file).

iOS / Android

Install the WireGuard app from the App Store or Google Play. Use the QR code option - generate a QR code from the config on your workstation:

# Install qrencode
nix run nixpkgs#qrencode -- -t ansiutf8 < alice.conf

Or use the app's "Import from file" option.

Windows

Install WireGuard for Windows. Use "Import tunnel(s) from file" and select the .conf file.

Access groups

The group field in vars.wireguardUsers is a label - it doesn't currently enforce any network policy. It's intended for documentation and future use (e.g., network policies to restrict which cluster services different groups can reach).

Suggested conventions:

  • admin - full access, including Nextcloud SSO
  • family - homelab services, limited external routing
  • friends - split tunnel, homelab access only
  • guests - internet-only via VPN (no homelab access)

To implement network isolation between groups, add Kubernetes NetworkPolicy resources in the relevant namespaces based on the source VPN IP ranges for each group.

Nextcloud SSO mechanics

The caddy sidecar's Caddyfile is generated at build time from vars.wireguardUsers. For each user with nextcloudUser set, it generates a block like:

@alice remote_ip 10.0.100.2
handle @alice {
  reverse_proxy http://nextcloud.nextcloud.svc.cluster.local:8080 {
    header_up Host nextcloud.yourdomain.example.com
    header_up X-Remote-User "alice"
  }
}

Requests from unrecognized VPN IPs get a 403. VPN users without nextcloudUser set will receive a 403 when accessing Nextcloud over the VPN - they should use the normal LAN or internet path instead.

The Nextcloud Helm chart is configured with:

'trusted_proxies' => ['10.42.0.0/16'],
'forwarded_for_headers' => ['HTTP_X_FORWARDED_FOR'],

This makes Nextcloud trust the X-Remote-User header when it comes from the k3s pod CIDR - where caddy runs.

Adding a Kubernetes Chart

All Kubernetes services are defined as Nix files under modules/system/k8s/services/. There is no raw YAML in the repo - everything is rendered at build time by nix-kube-generators. Helm charts are sourced from nixhelm. This page walks through adding a new service from scratch.

The four-step process

  1. Create a service file under services/<category>/
  2. Add a namespace to vars.namespaces in charts.nix (if needed)
  3. Import the file in charts.nix
  4. Add the chart name(s) to a deployment group in default.nix

Step 1: Write the service file

Create modules/system/k8s/services/<category>/myapp.nix. Every service file has the same signature:

{ pkgs, inputs, lib, vars }:

And returns an attrset where each key is a chart name and each value is the result of lib.mkChart, lib.mkRawManifest, or lib.mkSecretRef.

Helm chart

Use lib.mkChart when there is a Helm chart available via nixhelm:

{ pkgs, inputs, lib, vars }:

{
  myapp = lib.mkChart {
    name      = "myapp";
    chart     = lib.nixhelm.<org>.<chart>;   # see nixhelm.charts for available charts
    namespace = vars.namespaces.myapp;
    values    = {
      replicaCount = 1;
      service.type = "LoadBalancer";
      service.loadBalancerIP = vars.ipPools.nginxExternal;
    };
  };
}

If you want to layer defaults with overrides:

let
  defaults = { replicaCount = 1; resources.limits.memory = "256Mi"; };
  overrides = { service.type = "ClusterIP"; };
  finalValues = lib.overlayValues defaults overrides;
in {
  myapp = lib.mkChart {
    name      = "myapp";
    chart     = lib.nixhelm.bitnami.myapp;
    namespace = vars.namespaces.myapp;
    values    = finalValues;
  };
}

Raw manifest

Use lib.mkRawManifest when you need to write Kubernetes resources directly as Nix attribute sets - no Helm chart involved:

{ pkgs, inputs, lib, vars }:

let
  deploymentResource = {
    apiVersion = "apps/v1";
    kind       = "Deployment";
    metadata   = {
      name      = "myapp";
      namespace = vars.namespaces.myapp;
    };
    spec = {
      replicas              = 1;
      selector.matchLabels  = { app = "myapp"; };
      template = {
        metadata.labels = { app = "myapp"; };
        spec.containers = [{
          name  = "myapp";
          image = "myapp:latest";
          ports = [{ containerPort = 8080; }];
        }];
      };
    };
  };

  serviceResource = {
    apiVersion = "v1";
    kind       = "Service";
    metadata   = { name = "myapp"; namespace = vars.namespaces.myapp; };
    spec = {
      selector  = { app = "myapp"; };
      type      = "ClusterIP";
      ports     = [{ port = 80; targetPort = 8080; }];
    };
  };
in {
  myapp = lib.mkRawManifest {
    name      = "myapp";
    namespace = vars.namespaces.myapp;
    resources = [ deploymentResource serviceResource ];
  };
}

A single service file can return multiple chart keys. The deployment group in default.nix references them individually.

Secret reference

Use lib.mkSecretRef to inject a SOPS secret into a Kubernetes Secret that your workloads mount:

myapp-password = lib.mkSecretRef {
  name           = "myapp-password";      # chart key (must be unique across all services)
  namespace      = vars.namespaces.myapp;
  secretName     = "myapp-credentials";   # name of the Kubernetes Secret object
  secretKey      = "password";            # key inside the Secret
  sopsSecretName = "myapp_password";      # key in secrets.yaml / sops/default.nix
};

At deploy time, k8s-deploy reads the decrypted value from the SOPS-managed file and patches it into the Kubernetes Secret. Multiple mkSecretRef entries can target the same secretName with different keys - patch-merge handles this without clobbering other keys.

The corresponding SOPS secret must be declared in modules/system/sops/default.nix:

sops.secrets.myapp_password = { owner = config.users.users.${username}.name; };

And added to modules/system/sops/secrets.yaml.example (and your encrypted secrets.yaml).

Step 2: Add a namespace

Open modules/system/k8s/charts.nix and add your namespace to vars.namespaces:

namespaces = {
  # ...existing entries...
  myapp = "myapp-system";
};

The deployment script creates all declared namespaces before applying any charts, so you don't need to create it manually.

Step 3: Import in charts.nix

Add your service file to the appropriate group in charts.nix. Pick the category that fits or add a new one:

appServices = {
  signalProxy = import ./services/apps/signal-proxy.nix { inherit pkgs inputs lib vars; };
  nextcloud   = import ./services/apps/nextcloud.nix    { inherit pkgs inputs lib vars; };
  myapp       = import ./services/apps/myapp.nix        { inherit pkgs inputs lib vars; };  # added
};

If your service needs config (to read SOPS secret paths), pass it too:

myapp = import ./services/apps/myapp.nix { inherit pkgs inputs lib vars config; };

Step 4: Add to a deployment group

Open modules/system/k8s/default.nix and add the chart name(s) to an existing group or create a new one.

Adding to an existing group

{
  name   = "apps";
  charts = [ "signal-proxy" "nextcloud" "myapp" ];   # added myapp
  dependsOn = [ "core-config" "networking-services" "external-access" ];
  waitFor = {
    # ...existing waitFor entries...
    myapp = {
      kind      = "deployment";
      name      = "myapp";
      namespace = "myapp-system";
      timeout   = 120;
    };
  };
}

Creating a new group

If your service has different dependencies, add a new group in the right position in the list:

{
  name         = "myapp-services";
  charts       = [ "myapp" "myapp-password" ];
  dependsOn    = [ "core-config" "networking-services" ];
  retryAttempts = 3;    # default; optional
  retryDelay    = 10;   # seconds between retries; optional
  waitFor = {
    myapp = {
      kind      = "deployment";
      name      = "myapp";
      namespace = "myapp-system";
      timeout   = 180;
    };
  };
}

Group fields:

FieldRequiredDefaultDescription
nameyes-Unique identifier; used for sentinel files under /var/lib/kubernetes/
chartsyes-Chart keys to deploy; must exist in regularCharts (i.e. mkChart or mkRawManifest, not mkSecretRef)
dependsOnno[]Group names that must have completed before this group runs
waitForno{}Resources to wait for after deploying this group before proceeding
retryAttemptsno3How many times to retry a failed kubectl apply
retryDelayno10Seconds to wait between retries

waitFor values:

FieldDescription
kind"deployment" uses kubectl rollout status; anything else uses kubectl wait --for=condition=Available
nameResource name
namespaceResource namespace
timeoutSeconds before giving up (warning only - deploy continues)

The vars object

Every service file receives vars from charts.nix. The full set of available values:

vars = {
  domain      = "yourdomain.example.com";  # from vars.nix
  upstreamDns = "192.168.1.1";             # from vars.nix
  wireguardUsers = { ... };                # from vars.nix

  namespaces = {
    dns        = "dns-system";
    pihole     = "pihole-system";
    nginx      = "nginx-system";
    metallb    = "metallb-system";
    longhorn   = "longhorn-system";
    monitoring = "monitoring-system";
    wireguard  = "wireguard-system";
    signalProxy = "signal-proxy";
    nextcloud  = "nextcloud";
    # ...your additions
  };

  ipPools = {
    metallb       = "192.168.1.192/26";
    nginxExternal = "192.168.1.193";
    pihole        = "192.168.1.250";
    wireguard     = "192.168.1.194";
  };

  piholeIp = "192.168.1.250";    # alias for ipPools.pihole

  versions = { pihole = "2025.11.1"; };  # pinned image versions

  defaultReplicas = 1;

  tls = {
    defaultIssuer        = "letsencrypt-prod";
    stagingIssuer        = "letsencrypt-staging";
    acmeServerProduction = "https://acme-v02.api.letsencrypt.org/directory";
    acmeServerStaging    = "https://acme-staging-v02.api.letsencrypt.org/directory";
  };
};

The lib object

Functions available in service files:

FunctionDescription
lib.mkChart { name, chart, namespace, values }Renders a Helm chart to YAML
lib.mkRawManifest { name, namespace, resources }Renders a list of Nix attrsets to a YAML stream
lib.mkSecretRef { name, namespace, secretName, secretKey, sopsSecretName }Injects a SOPS secret into a Kubernetes Secret
lib.overlayValues defaults overridesDeep-merges two attrsets, with overrides winning
lib.nixhelmAll charts available via nixhelm - reference as lib.nixhelm.<org>.<chart>
lib.kubelibnix-kube-generators utilities (buildHelmChart, toYAMLStreamFile)

Deploy

After making the changes:

# Check the flake evaluates
nix flake show

# Deploy to the master
colmena apply --on @master

The activation script (kubernetes-prepare) writes the rendered YAML to /var/lib/kubernetes/manifests/<chartname>.yaml and restarts k8s-deploy. The new chart is deployed in the order defined by its deployment group.

To watch the deployment live:

ssh youruser@master.local
sudo journalctl -fu k8s-deploy

Managing Secrets

All secrets are stored in modules/system/sops/secrets.yaml, encrypted with your age key via SOPS. The file is safe to commit - SOPS encryption means only the holder of the age private key can decrypt it. sops-nix decrypts the file at activation time and writes each secret to a tmpfs path that NixOS modules and the k8s deploy script can read.

Adding a new secret

Adding a secret requires three changes:

  1. Add the key to secrets.yaml
  2. Declare it in modules/system/sops/default.nix
  3. Reference the path in your module

1. Edit secrets.yaml

secrets.yaml is an encrypted YAML file. Open it in-place with SOPS - it decrypts to your editor, re-encrypts on save:

sops modules/system/sops/secrets.yaml

Add your key:

myapp_api_key: "the-actual-secret-value"

Save and close. SOPS re-encrypts immediately. If you have not yet encrypted the file (initial setup), fill it in plain text first and then encrypt:

sops --encrypt --in-place modules/system/sops/secrets.yaml

Also add a placeholder to modules/system/sops/secrets.yaml.example so future users know the key exists:

myapp_api_key: "your-myapp-api-key"

2. Declare in sops/default.nix

Open modules/system/sops/default.nix and add an entry inside sops.secrets:

sops.secrets = {
  # ...existing secrets...

  myapp_api_key = { owner = "${username}"; };
};

SOPS-nix will decrypt this key and write it to /run/secrets/myapp_api_key at boot.

Secret options

OptionDefaultDescription
owner"root"Unix user that owns the decrypted file
group"root"Unix group that owns the decrypted file
mode"0400"File permissions on the decrypted file
path/run/secrets/<name>Override where the decrypted file is written
neededForUsersfalseSet true for secrets used in users.users.<name>.hashedPasswordFile - decrypted before user activation
restartUnits[]systemd units to restart when this secret changes

Examples:

# Readable only by root (default)
myapp_api_key = {};

# Readable by a specific user
myapp_api_key = { owner = "${username}"; };

# Readable by all (e.g. a public key or non-sensitive config)
wireguard_server_public_key = { owner = "root"; mode = "0644"; };

# Written to a custom path (e.g. expected by a hardcoded service)
private_ssh_key = {
  path  = "/home/${username}/.ssh/id_ed25519";
  mode  = "0600";
  owner = "${username}";
};

# User password - must be decrypted before users are activated
user_password = { neededForUsers = true; };

3. Reference in your module

Use config.sops.secrets.<name>.path to get the runtime path of the decrypted file:

{ config, ... }:
{
  services.myapp = {
    enable = true;
    # Pass the path to the decrypted file, not the value itself
    apiKeyFile = config.sops.secrets.myapp_api_key.path;
  };
}

Or read it inline in a shell script (e.g. inside a systemd ExecStart):

systemd.services.myapp = {
  script = ''
    API_KEY=$(cat ${config.sops.secrets.myapp_api_key.path})
    exec myapp --api-key "$API_KEY"
  '';
};

Injecting a secret into a Kubernetes Secret

For secrets used by Kubernetes workloads, use lib.mkSecretRef in the service file instead of referencing config.sops.secrets directly. The k8s-deploy script reads the decrypted file and patches it into a Kubernetes Secret object.

See Adding a Chart - Secret reference for the full workflow.

Rotating a secret

  1. Open the file: sops modules/system/sops/secrets.yaml
  2. Change the value
  3. Save - SOPS re-encrypts
  4. Redeploy: colmena apply

SOPS-nix detects the changed secret and restarts any units listed in restartUnits for that secret. Services that read the path at startup (not at module load) pick up the new value automatically on restart.

Re-keying (replacing the age key)

If you need to rotate the age key itself:

# Generate a new key
age-keygen -o ~/.config/sops/age/keys.txt.new

# Update .sops.yaml with the new public key, then re-encrypt
sops updatekeys modules/system/sops/secrets.yaml

# Remove the old key
mv ~/.config/sops/age/keys.txt.new ~/.config/sops/age/keys.txt

Update /persist/etc/sops-nix/keys.txt on each node with the new private key, then redeploy.