nixlab
A NixOS homelab template that stands up a k3s cluster with WireGuard VPN, Nextcloud, Pi-hole, and automatic TLS - all configured from a single file.
What it is
nixlab is an opinionated NixOS flake for running a multi-node k3s homelab. The entire cluster is configured from one file (vars.nix). Adding a node means copying a hardware config template, filling in an IP and disk, and running colmena apply. Everything else - k3s roles, service deployment, disk layout, impermanence, secrets - is derived automatically.
Every boot wipes / via a btrfs rollback in the initrd. Only /persist survives, so your nodes are always in a known-good state.
What's included
| Service | Purpose | Exposure |
|---|---|---|
| MetalLB | LoadBalancer IPs from your LAN | pool from vars.metallbPool |
| Longhorn | Distributed block storage | internal |
| nginx ingress | HTTP/S ingress controller | vars.nginxIp |
| Pi-hole | LAN DNS + ad blocking | vars.piholeIp |
| ExternalDNS (Pi-hole) | Auto-registers local DNS from ingress | LAN |
| ExternalDNS (Cloudflare) | Auto-registers public DNS | public |
| cert-manager | Let's Encrypt TLS via DNS-01 | - |
| DDNS | Keeps Cloudflare A record current | - |
| WireGuard | VPN with per-user Nextcloud SSO | vars.wireguardIp |
| Nextcloud | Self-hosted cloud storage | nextcloud.<vars.domain> |
| Signal proxy | Signal messenger proxy | signal.<vars.domain> |
Design principles
One config file. vars.nix is the only file you edit. Everything the cluster needs - usernames, IPs, domains, nodes, WireGuard users - lives there. Nix propagates it everywhere.
Nodes are just hardware. Every NixOS config comes from modules/system/node.nix, which reads your node's entry in vars.nodes by hostname. A hosts/<name>/hardware-configuration.nix file is the only per-node artifact.
Immutable by default. Every boot starts from a clean btrfs subvolume. State that needs to survive goes into /persist via the impermanence module. This makes nodes predictable and easy to rebuild.
No hand-rolling YAML. Kubernetes manifests are generated in Nix from nixhelm chart definitions and nix-kube-generators. The master node's activation script applies them in dependency order via a systemd service - no external CD tool required.
Navigation
- Architecture: cluster topology, impermanence design, how vars.nix flows through the system
- Getting Started: step-by-step setup with exact commands
- Configuration: full
vars.nixfield reference - Adding Nodes: how to add a new node to the cluster
- Services: what each service does and how it's configured
- WireGuard VPN: managing VPN users, client setup, access groups
Architecture
Cluster topology
nixlab runs a k3s cluster where nodes are divided into exactly one master and any number of workers. The master runs the k3s server process plus all Kubernetes workloads. Workers run k3s agent and contribute CPU, RAM, and disk (via Longhorn) to the cluster.
graph TD
subgraph LAN["LAN (192.168.1.0/24)"]
Router["Router / DHCP"]
subgraph Cluster["k3s cluster"]
Master["master node\nk3s server\nk8s-deploy service\nall Helm charts"]
Worker1["worker node\nk3s agent\nLonghorn replica"]
Worker2["worker node\nk3s agent\nLonghorn replica"]
end
subgraph MetalLB["MetalLB IP pool (192.168.1.192/26)"]
PiholeIP["Pi-hole\n192.168.1.250"]
WgIP["WireGuard\n192.168.1.194"]
NginxIP["nginx ingress\n192.168.1.193"]
end
Client["LAN client"]
VPNClient["VPN client\n10.0.100.x"]
end
Cloudflare["Cloudflare DNS\n+ ACME DNS-01"]
Router -->|"DHCP / mDNS"| Master
Router -->|"DHCP / mDNS"| Worker1
Router -->|"DHCP / mDNS"| Worker2
Client -->|"DNS"| PiholeIP
Client -->|"HTTP/S"| NginxIP
VPNClient -->|"WireGuard UDP 51820"| WgIP
VPNClient -->|"Nextcloud HTTPS via caddy sidecar"| WgIP
Master -->|"DDNS + ExternalDNS"| Cloudflare
Cloudflare -->|"DNS-01 challenge"| Master
How vars.nix flows through the system
vars.nix is a plain Nix attribute set. It is imported at three levels:
-
flake.nix: readsvars.nodesto buildnixosConfigurationsand the Colmena deployment set. Each node becomes one configuration, usingnode.hostnameas the NixOS hostname andnode.tagsas Colmena tags for targeted deploys. -
modules/system/node.nix: receiveshostnameas aspecialArgand looks upvars.nodes.${hostname}to setdiskConfig.device,diskConfig.espSize, and themasterflag for k3s. -
modules/system/k8s/charts.nixand all service files underk8s/services/- importvars.nixdirectly and readdomain,metallbPool,piholeIp,wireguardIp,nginxIp,wireguardUsers, andupstreamDnsto populate Helm values and raw manifests.
graph LR
V["vars.nix"]
F["flake.nix\nnixosConfigurations\ncolmena"]
N["node.nix\ndiskConfig\nk3s.master"]
C["charts.nix\nHelm values\nraw manifests"]
V --> F
V --> N
V --> C
F --> N
Nothing else in the repo needs to be edited to configure the cluster. vars.nix is the single source of truth.
Disk layout and impermanence
Each node's disk is partitioned with Disko:
/dev/sdX (GPT)
├── boot (1 MiB, BIOS boot)
├── ESP (500 MiB, vfat, /boot)
└── root (remainder, LVM PV)
└── root_vg / root (LVM LV)
└── btrfs
├── subvol /root → / (wiped on every boot)
├── subvol /persist → /persist (survives reboots)
└── subvol /nix → /nix (Nix store, survives reboots)
During early boot, a systemd initrd service (rollback) runs before / is mounted:
- Mounts the raw btrfs volume.
- Renames the current
/rootsubvolume to/old_roots/<timestamp>. - Creates a fresh empty
/rootsubvolume. - Deletes
old_rootsentries older than 30 days. - Unmounts and lets the normal boot continue.
The result: every boot starts from a clean slate. Files written to / during a session vanish on the next reboot.
What persists
The impermanence module (modules/system/impermanence/default.nix) bind-mounts selected paths from /persist into the fresh /:
System state:
/etc/nixos,/var/log,/var/lib/nixos- NixOS metadata/etc/NetworkManager/system-connections,/var/lib/NetworkManager- network config/var/lib/kubelet,/var/lib/rancher/k3s,/etc/rancher- k3s state/var/lib/longhorn,/var/lib/csi,/var/lib/docker- storage/etc/machine-id,/etc/adjtime- stable machine identity
SSH host keys live at /persist/etc/ssh/ssh_host_ed25519_key and ssh_host_rsa_key. The openssh service is configured to use those paths so host key fingerprints don't change across reboots.
User home directories: ~/Code, ~/Documents, ~/.ssh, ~/.gnupg, ~/.config, ~/.local and ~/.config/sops are persisted per-user.
SOPS age key: /persist/etc/sops-nix/keys.txt - the age private key used to decrypt secrets.yaml.
k3s module
modules/system/k3s/default.nix configures k3s based on node.master:
- Master (
master = true): runs asserverwith--cluster-init,--disable=servicelb,--disable=traefik,--disable=local-storage. Writes kubeconfig to/etc/rancher/k3s/k3s.yaml(mode 0644 so the cluster user can read it). - Agent (
master = false): runs asagentand derives the master's URL fromvars.nodesat build time -serverAddr = "https://${masterHostname}:6443". No manual IP configuration needed.
Both roles use config.sops.secrets.k3s_token.path as the token file and run openiscsi (required by Longhorn).
The firewall is disabled cluster-wide (mkForce false) since k3s and MetalLB handle their own iptables rules. The host firewall interferes with CNI traffic.
Kubernetes chart deployment
On the master node, a NixOS activation script (kubernetes-prepare) and a systemd service (k8s-deploy) handle all Kubernetes deployments.
Activation script (runs on every nixos-rebuild switch / colmena apply):
- Writes rendered manifest YAML files to
/var/lib/kubernetes/manifests/. - Clears deployment sentinel files (
/var/lib/kubernetes/.deploy-<group>-done) for any group whose chart content changed (detected by comparing Nix store paths). - Restarts the
k8s-deployservice.
k8s-deploy service (oneshot, runs after k3s.service):
- Waits for the Kubernetes API to respond.
- Creates all required namespaces.
- Deletes all existing Jobs (to unblock re-runs).
- Reads SOPS secrets and creates/patches Kubernetes Secret objects for credentials that services reference.
- Deploys chart groups in dependency order, retrying each chart up to 3 times (configurable per group).
- After each group succeeds, writes a sentinel file and waits for the declared readiness conditions before moving to the next group.
- Runs the Nextcloud SSO setup script.
Deployment groups, in order:
| Group | Charts | Waits for |
|---|---|---|
| core-infrastructure | longhorn, metallb | longhorn-driver-deployer, metallb-controller |
| core-config | metallb-config | - (retries 5×) |
| networking-services | ingress-nginx, pihole | ingress-nginx-controller |
| dns-services | externaldns-pihole | external-dns |
| external-access | cert-manager | cert-manager deployment |
| external-dns | externaldns-cloudflare, cert-manager-issuers, cloudflare-ddns | external-dns |
| external-ingress | pihole-external-ingress | - |
| vpn-services | wireguard-config, wireguard-caddy-cert, wireguard-storage, wireguard-deployment, wireguard-service | wireguard deployment |
| apps | signal-proxy, nextcloud | signal-proxy, nextcloud deployments |
SOPS secrets
All secrets live in modules/system/sops/secrets.yaml, encrypted with SOPS + age. The key file location on running nodes is /persist/etc/sops-nix/keys.txt. sops-nix decrypts secrets at activation time.
modules/system/sops/default.nix declares every secret and maps it to a file path that NixOS services can read. WireGuard user public keys are registered automatically by iterating over enabled entries in vars.wireguardUsers - each entry's publicKeySecret field becomes a SOPS secret name.
The Kubernetes deployment script reads decrypted secret paths from config.sops.secrets.<name>.path and pushes them into Kubernetes Secrets via kubectl patch.
Getting Started
This guide walks you through setting up nixlab from scratch: generating keys, filling in vars.nix, preparing your first node, and deploying the cluster.
Prerequisites
You need the following tools on your workstation (not on the nodes):
- Nix with flakes enabled (
experimental-features = nix-command flakesin~/.config/nix/nix.conf) - age: for generating the encryption key
- sops: for creating and editing the secrets file
- Colmena: for deploying to nodes
Install them all with:
nix profile install nixpkgs#age nixpkgs#sops nixpkgs#colmena
Or temporarily via nix shell:
nix shell nixpkgs#age nixpkgs#sops nixpkgs#colmena
Step 1: Clone the repo
git clone https://github.com/thatbagu/nixlab
cd nixlab
Step 2: Generate your age key
nixlab uses age for encrypting secrets. Generate a key pair:
age-keygen -o ~/.config/sops/age/keys.txt
This prints the public key to stdout. Copy it - you need it in the next step.
Now update .sops.yaml with your public key. Open it and replace the placeholder:
keys:
- &primary age1REPLACE_WITH_YOUR_AGE_PUBLIC_KEY
Change age1REPLACE_WITH_YOUR_AGE_PUBLIC_KEY to the public key that age-keygen printed.
Step 3: Generate a cluster SSH key
All nodes use a single SSH key for cluster access:
ssh-keygen -t ed25519 -C "nixlab-cluster" -f ~/.ssh/nixlab-cluster
Note the public key:
cat ~/.ssh/nixlab-cluster.pub
Step 4: Fill in vars.nix
Open vars.nix and replace every placeholder value with your real values:
{
username = "youruser"; # your Linux username
timezone = "Europe/Berlin"; # timedatectl list-timezones
clusterSshKey = "ssh-ed25519 AAAA... youruser@host"; # from step 3
nodes = {
master = {
hostname = "mymaster"; # must match hosts/<hostname>/
master = true;
disk = "/dev/sda"; # check with lsblk on the target machine
tags = [ "homelab" "master" "mymaster" ];
};
};
domain = "yourdomain.example.com"; # Cloudflare-managed domain
metallbPool = "192.168.1.192/26"; # outside your DHCP range
piholeIp = "192.168.1.250";
wireguardIp = "192.168.1.194";
nginxIp = "192.168.1.193";
upstreamDns = "192.168.1.1"; # your router
wireguardUsers = {}; # add users later with add-wg-user.sh
}
The IPs in metallbPool, piholeIp, wireguardIp, and nginxIp must all be in the same subnet and outside your router's DHCP assignment range.
See Configuration for a full field reference.
Step 5: Create and encrypt secrets.yaml
Copy the example file:
cp modules/system/sops/secrets.yaml.example modules/system/sops/secrets.yaml
Fill in the real values. For secrets you need to generate:
# k3s cluster join token - any long random string
openssl rand -hex 32
# WireGuard server keys
wg genkey | tee /tmp/wg-server.key | wg pubkey > /tmp/wg-server.pub
cat /tmp/wg-server.key # wireguard_server_private_key
cat /tmp/wg-server.pub # wireguard_server_public_key
# Linux user password hash (replace 'yourpassword')
mkpasswd -m sha-512 yourpassword
For wireguard_server_endpoint: use your public IP or a DDNS hostname. The DDNS service (if enabled) will keep Cloudflare updated, but the WireGuard endpoint in secrets.yaml is what clients use to connect.
For Cloudflare credentials: create an API token at https://dash.cloudflare.com/profile/api-tokens with Zone:DNS:Edit permission. The email is your Cloudflare account email.
For private_ssh_key: this is the private key of the cluster SSH key from step 3. The full private key, including the header/footer lines.
Once secrets.yaml is filled in with real values, encrypt it:
sops --encrypt --in-place modules/system/sops/secrets.yaml
SOPS will use the age key from .sops.yaml. The encrypted file is safe to commit - commit it now:
git add modules/system/sops/secrets.yaml
git commit -m "add encrypted secrets"
Step 6: Prepare the first node's hardware config
Boot your target machine with a NixOS installer ISO. Once booted:
nixos-generate-config --no-filesystems
cat /etc/nixos/hardware-configuration.nix
The --no-filesystems flag skips filesystem detection (Disko handles that). Copy the output to your workstation:
mkdir -p hosts/mymaster
# paste the hardware-configuration.nix content here
The file should look something like:
{ config, lib, modulesPath, ... }:
{
imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];
boot.initrd.availableKernelModules = [ "nvme" "xhci_pci" "ahci" "usbhid" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ "kvm-intel" ]; # or kvm-amd
boot.extraModulePackages = [ ];
swapDevices = [ ];
networking.useDHCP = lib.mkDefault true;
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
}
The hostname in vars.nodes must match the directory name under hosts/.
Step 7: Initial install
From the NixOS installer on the target machine, with the repo available (via git clone or a mounted drive):
# Install using Disko to partition the disk, then NixOS
nix run github:nix-community/disko -- --mode disko --flake .#mymaster
nixos-install --flake .#mymaster --no-root-password
Alternatively, if you already have a running NixOS system on the node (even a minimal one), you can deploy directly from your workstation:
colmena apply --on mymaster
Colmena connects via SSH (<hostname>.local using mDNS, as the targetHost in the flake) and switches the system.
Step 8: Verify
After the install reboots:
# SSH into the master
ssh -i ~/.ssh/nixlab-cluster youruser@mymaster.local
# Check k3s is running
sudo k3s kubectl get nodes
# Watch service deployment (takes a few minutes on first boot)
sudo journalctl -fu k8s-deploy
The k8s-deploy service applies all Kubernetes charts in dependency order. Once it finishes, all services should be running:
sudo k3s kubectl get pods --all-namespaces
Step 9: Add more nodes
See Adding Nodes.
Step 10: Add VPN users
See WireGuard VPN.
Subsequent deploys
After changing vars.nix or any module:
# Deploy to all nodes
colmena apply
# Deploy only to master
colmena apply --on @master
# Deploy only to workers
colmena apply --on @worker
# Deploy to a specific node
colmena apply --on mymaster
Colmena uses the tags field in vars.nodes entries to resolve @master and @worker selectors.
Configuration Reference
vars.nix is the only file you edit to configure nixlab. It is a plain Nix attribute set that the flake and all modules import directly.
This page documents every field.
Top-level fields
username
Type: string
Example: "alice"
The Linux username created on every cluster node. This user:
- Has sudo access (NOPASSWD for all commands)
- Can SSH in with
clusterSshKey - Owns the SOPS-decrypted secrets that need non-root access
- Has their home directory persisted at
/persist/home/<username>/
The same username is used on all nodes. There is no per-node user configuration.
timezone
Type: string
Example: "Europe/Berlin"
The timezone for all cluster nodes. Set via time.timeZone.
Find valid values:
timedatectl list-timezones
clusterSshKey
Type: string (SSH public key)
Example: "ssh-ed25519 AAAA... user@host"
The SSH public key added to ~/.ssh/authorized_keys for username on every node. This is how Colmena connects to nodes for deployment.
Generate a dedicated cluster key:
ssh-keygen -t ed25519 -C "nixlab-cluster" -f ~/.ssh/nixlab-cluster
nodes
Type: attribute set of node definitions
Default: {} (empty - cluster won't deploy)
Each attribute in nodes defines one cluster node. The attribute name is arbitrary (used as a label in Colmena); the hostname field is what actually matters.
nodes = {
master = {
hostname = "mymaster";
master = true;
disk = "/dev/sda";
tags = [ "homelab" "master" "mymaster" ];
};
worker1 = {
hostname = "worker1";
master = false;
disk = "/dev/nvme0n1";
tags = [ "homelab" "worker" "worker1" ];
};
};
nodes.<name>.hostname
Type: string
Required
The NixOS hostname for this node. Must match:
- The directory name under
hosts/containinghardware-configuration.nix - The hostname the machine announces via mDNS (i.e.,
<hostname>.localmust resolve on your LAN)
The flake sets networking.hostName = hostname for each node.
nodes.<name>.master
Type: bool
Required
Set to true for exactly one node. That node runs the k3s server process and all Kubernetes workloads (via the k8s-deploy service). All other nodes are k3s agents.
The master's hostname is derived automatically from vars.nodes at build time and embedded in the agent serverAddr - no manual IP configuration needed.
nodes.<name>.disk
Type: string (device path)
Required
Example: "/dev/sda", "/dev/nvme0n1"
The block device Disko will partition. This disk will be completely wiped during installation. Verify with lsblk on the target machine before setting this value.
nodes.<name>.espSize
Type: string (size with unit)
Default: "500M"
Size of the EFI system partition. The default is sufficient for most setups. Increase if you store many NixOS generations in /boot.
nodes.<name>.tags
Type: list of strings
Required
Colmena deployment tags. Used to target groups of nodes:
colmena apply --on @master # deploys to nodes tagged "master"
colmena apply --on @worker # deploys to nodes tagged "worker"
Include at minimum the role tag ("master" or "worker") and the hostname. Additional tags are arbitrary.
Networking fields
domain
Type: string
Example: "home.example.com"
Your public domain, managed by Cloudflare. Used for:
- Let's Encrypt TLS certificates (DNS-01 challenge via Cloudflare)
- External DNS records (via ExternalDNS Cloudflare provider)
- Nextcloud's hostname:
nextcloud.<domain> - Signal proxy hostname:
signal.<domain> - Pi-hole external ingress:
pihole.<domain>
The domain must be in a Cloudflare-managed zone. The Cloudflare API token in secrets.yaml must have Zone:DNS:Edit permission for this zone.
metallbPool
Type: string (CIDR)
Example: "192.168.1.192/26"
The IP range MetalLB draws from when assigning LoadBalancer IPs. Must be:
- Within your LAN subnet
- Outside your router's DHCP range
- Large enough to hold
piholeIp,wireguardIp,nginxIpand any future services
A /26 gives 62 usable addresses, which is more than enough.
piholeIp
Type: string (IP address)
Example: "192.168.1.250"
The static IP assigned to Pi-hole's LoadBalancer service. Pi-hole serves both DNS (port 53) and the web UI on this IP.
Configure your router to hand out this IP as the DNS server for your LAN clients, or set it manually on each device.
Must be within metallbPool.
wireguardIp
Type: string (IP address)
Example: "192.168.1.194"
The static IP assigned to the WireGuard LoadBalancer service (UDP port 51820). VPN clients connect to this IP.
The caddy sidecar inside the WireGuard pod also listens on this IP for HTTPS (TCP 443) to serve Nextcloud with header-injected SSO.
Must be within metallbPool.
nginxIp
Type: string (IP address)
Example: "192.168.1.193"
The static IP assigned to the nginx ingress controller. All HTTP/S traffic for cluster services routes through this IP. ExternalDNS registers ingress hostnames pointing here.
Must be within metallbPool.
upstreamDns
Type: string (IP address)
Example: "192.168.1.1"
The upstream DNS resolver Pi-hole forwards non-blocked queries to. Typically your router's LAN IP.
wireguardUsers
Type: attribute set of user definitions
Default: {} (no VPN users)
Each attribute defines one WireGuard VPN user. Users are applied to the WireGuard server configuration and, optionally, to Nextcloud SSO.
wireguardUsers = {
"alice" = {
ip = "10.0.100.2";
group = "admin";
publicKeySecret = "alice_wg_public_key";
allowedIPs = "0.0.0.0/0";
nextcloudUser = "alice";
description = "Alice - full admin access";
enabled = true;
};
};
wireguardUsers.<name>.ip
Type: string (IP address)
Example: "10.0.100.2"
The VPN IP assigned to this user. Must be unique within the 10.0.100.0/24 range. The server uses .1; users start at .2.
Use add-wg-user.sh to assign IPs automatically - it reads existing allocations from vars.nix and picks the next free one.
wireguardUsers.<name>.group
Type: string
Example: "admin", "family", "friends", "guests"
An arbitrary access group label. Not currently enforced by the system (no firewall rules are generated per group), but useful for documentation and future policy enforcement.
wireguardUsers.<name>.publicKeySecret
Type: string
Example: "alice_wg_public_key"
The name of the SOPS secret that holds this user's WireGuard public key. The secret must exist in modules/system/sops/secrets.yaml.
add-wg-user.sh creates this entry automatically.
wireguardUsers.<name>.allowedIPs
Type: string (CIDR or comma-separated CIDRs)
Example: "0.0.0.0/0", "192.168.1.0/24"
Traffic routes the client should send through the VPN tunnel. "0.0.0.0/0" routes all traffic through the VPN (full tunnel). A LAN CIDR routes only homelab traffic (split tunnel).
wireguardUsers.<name>.nextcloudUser
Type: string (optional)
Example: "alice"
If set, the caddy sidecar injects X-Remote-User: <nextcloudUser> when this VPN user (identified by their VPN IP) connects to Nextcloud over the VPN. Nextcloud trusts this header for automatic login - no password prompt when accessing from the VPN.
Omit this field for users who should not have Nextcloud SSO.
wireguardUsers.<name>.description
Type: string
Example: "Alice - full admin access"
A human-readable description. Not used by the system; for documentation only.
wireguardUsers.<name>.enabled
Type: bool
Set to false to disable a user without removing their entry. Disabled users are excluded from the WireGuard server config and their SOPS secret is not registered.
Adding Nodes
Adding a node to the cluster is a four-step process: prepare the hardware config, register the node in vars.nix, do the initial install, and deploy.
Step 1: Get the hardware configuration
Boot the target machine with a NixOS installer ISO. Once booted, generate the hardware config:
nixos-generate-config --no-filesystems
The --no-filesystems flag skips auto-detected filesystem entries - Disko generates those declaratively, so you don't want them duplicated.
Copy the generated file to your workstation. The node's hostname in vars.nix must match the directory you create here:
mkdir -p hosts/<hostname>
# copy /etc/nixos/hardware-configuration.nix to hosts/<hostname>/hardware-configuration.nix
A minimal hardware config looks like:
{ config, lib, modulesPath, ... }:
{
imports = [ (modulesPath + "/installer/scan/not-detected.nix") ];
boot.initrd.availableKernelModules = [ "nvme" "xhci_pci" "ahci" "usbhid" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ "kvm-intel" ]; # or kvm-amd for AMD CPUs
boot.extraModulePackages = [ ];
swapDevices = [ ];
networking.useDHCP = lib.mkDefault true;
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
}
Keep only what nixos-generate-config produces - the kernel modules and firmware detection. Remove any fileSystems or disko entries if they appear.
Step 2: Add the node to vars.nix
Open vars.nix and add an entry under nodes:
nodes = {
master = { ... }; # existing master
worker1 = {
hostname = "worker1"; # must match hosts/<hostname>/ directory
master = false; # true for exactly one node
disk = "/dev/sda"; # verify with lsblk on the target
tags = [ "homelab" "worker" "worker1" ];
};
};
To check the disk device on the target machine while it's running the installer:
lsblk -d -o NAME,SIZE,MODEL
Pick the device you want to install to. The entire disk will be wiped.
Step 3: Initial installation
Option A: Fresh install from the NixOS installer (recommended for new machines)
On the target machine, with the nixlab repo accessible (clone it or mount it):
cd nixlab
# Partition and format the disk with Disko
nix run github:nix-community/disko -- --mode disko --flake .#<hostname>
# Install NixOS
nixos-install --flake .#<hostname> --no-root-password
Disko reads vars.nodes.<name>.disk and creates the GPT → LVM → btrfs layout automatically.
After nixos-install completes, reboot:
reboot
Option B: Deploy from your workstation (for machines with SSH access)
If the target already has a running NixOS system with SSH access:
colmena apply --on <hostname>
Colmena connects to <hostname>.local (mDNS) as vars.username and switches the system.
Note: if the target is not yet running NixOS (e.g., it's running another distro), use Option A.
Step 4: Verify
SSH into the new node:
ssh -i ~/.ssh/nixlab-cluster <username>@<hostname>.local
Check it joined the k3s cluster:
# From the master node
sudo k3s kubectl get nodes
The new node should appear with status Ready within a minute or two of booting. k3s agent derives the master's address from the compiled-in serverAddr - no manual configuration on the worker is needed.
If the node is a worker, Longhorn will automatically pick it up for replica scheduling once the Longhorn manager pod starts on the new node.
Subsequent deploys
After the initial install, subsequent config changes are applied with:
# Deploy only to the new node
colmena apply --on <hostname>
# Or deploy to all workers at once
colmena apply --on @worker
Removing a node
-
Drain the node in Kubernetes to migrate workloads away:
kubectl drain <hostname> --ignore-daemonsets --delete-emptydir-data -
Delete the node from the cluster:
kubectl delete node <hostname> -
Remove the entry from
vars.nodesinvars.nix. -
Delete the
hosts/<hostname>/directory. -
Run
colmena applyto update the remaining nodes (removes the node from the k3s agent token scope).
Services
All services run on the master node as Kubernetes workloads. They are deployed and managed by the k8s-deploy systemd service, which applies charts in dependency order every time the NixOS configuration changes.
Service configuration lives in modules/system/k8s/services/. Most values are driven by vars.nix. Charts are pulled from nixhelm and rendered to YAML by nix-kube-generators.
MetalLB
Namespace: metallb-system
Chart source: nixhelm / metallb
IP pool: vars.metallbPool
MetalLB provides LoadBalancer-type Services on bare metal by assigning IPs from vars.metallbPool and responding to ARP requests on your LAN. Without MetalLB, type: LoadBalancer services would stay in <Pending> state forever.
The pool is configured via a metallb-config manifest (a IPAddressPool + L2Advertisement resource pair) deployed in the core-config group after MetalLB itself is ready.
Services that get a static IP from the pool:
- Pi-hole:
vars.piholeIp - WireGuard:
vars.wireguardIp - nginx ingress:
vars.nginxIp
Longhorn
Namespace: longhorn-system
Chart source: nixhelm / longhorn
Longhorn provides distributed block storage across all cluster nodes. It creates replicated PersistentVolume objects for stateful services (Nextcloud, its PostgreSQL database, WireGuard data).
Every node has the Longhorn node label set at k3s startup:
--node-label=node.longhorn.io/create-default-disk=true
This tells Longhorn to use the node's default disk (under /var/lib/longhorn/) for replica storage. Longhorn data is persisted across reboots via the impermanence module (/var/lib/longhorn is bind-mounted from /persist).
Storage class name: longhorn (used by all PVC definitions in the cluster).
nginx ingress
Namespace: nginx-system
Chart source: nixhelm / ingress-nginx
IP: vars.nginxIp
The nginx ingress controller handles all external HTTP/S traffic. Ingress objects in other namespaces use ingressClassName: nginx.
TLS termination is handled by cert-manager (via cert-manager.io/cluster-issuer annotation on each Ingress). nginx passes the decrypted request upstream to the service.
The controller is assigned a static LoadBalancer IP from MetalLB (vars.nginxIp). ExternalDNS watches Ingress objects and registers their hostnames in Pi-hole (local) and Cloudflare (external).
Pi-hole
Namespace: pihole-system
Chart source: nixhelm / mojo2600/pihole
Version: configured in charts.nix (vars.versions.pihole)
IP: vars.piholeIp (shared by DNS and web UI)
Pi-hole provides LAN-wide DNS filtering and ad blocking. Both the DNS service (UDP/TCP 53) and the web UI share the same LoadBalancer IP via MetalLB's IP sharing (metallb.universe.tf/allow-shared-ip).
Upstream DNS: vars.upstreamDns (typically your router).
Custom DNS entries are injected at deploy time:
pihole.home→vars.piholeIp(local admin UI access)nextcloud.<vars.domain>→10.0.100.1- routes VPN clients to the caddy sidecar instead of the nginx ingress. This enables Nextcloud's VPN-based SSO without affecting LAN clients, who hit nginx normally.
To access the Pi-hole admin UI: http://pihole.home/admin from your LAN (configure your device's DNS to point to vars.piholeIp first, or set it on the router for all devices).
ExternalDNS (Pi-hole)
Namespace: pihole-system
Chart source: nixhelm / external-dns
Watches Ingress and Service objects and automatically registers/removes DNS entries in Pi-hole. This means any service with an Ingress gets a local DNS name without manual Pi-hole configuration.
The ExternalDNS Pi-hole provider reads the Pi-hole API to manage entries. It runs in the same namespace as Pi-hole.
ExternalDNS (Cloudflare)
Namespace: external-dns
Chart source: nixhelm / external-dns
A second ExternalDNS instance that registers public DNS records in Cloudflare. It watches the same Ingress objects but only registers hostnames that match vars.domain.
Requires the Cloudflare API token (from secrets.yaml) injected as a Kubernetes Secret.
cert-manager
Namespace: cert-manager
Chart source: nixhelm / cert-manager
Issues Let's Encrypt TLS certificates via DNS-01 challenge. DNS-01 is required for wildcard certificates and works without the cluster being publicly reachable (Cloudflare handles the challenge response).
Two ClusterIssuer resources are created:
letsencrypt-prod- production certificates (used by all services)letsencrypt-staging- for testing without hitting rate limits
To use staging: change vars.tls.defaultIssuer in charts.nix to "letsencrypt-staging".
The Cloudflare API token and email are injected as Kubernetes Secrets from SOPS (cloudflare_token, cloudflare_email).
DDNS (Cloudflare)
Namespace: external-dns
Chart source: custom manifest
A CronJob that periodically resolves your public IP and updates a Cloudflare A record. This keeps your public DNS pointing at your home IP even if it changes.
The Cloudflare credentials are the same ones used by cert-manager.
WireGuard
Namespace: wireguard-system
Image: lscr.io/linuxserver/wireguard:latest
IP: vars.wireguardIp
The WireGuard pod runs two containers:
wireguard container: the VPN server itself. The server config (wg0.conf) is generated from a template in a ConfigMap. An init container fills in the server private key and user public keys by reading Kubernetes Secrets (which the k8s-deploy service populated from SOPS).
Caddy sidecar: Caddy terminates HTTPS on port 443 of the WireGuard pod's IP (10.0.100.1 inside the VPN). It matches incoming requests by the client's VPN IP and injects X-Remote-User: <nextcloudUser> before proxying to Nextcloud's internal service. This gives VPN users automatic login to Nextcloud without a password.
The Caddy TLS certificate is issued by cert-manager for nextcloud.<vars.domain>. Because cert-manager issues secrets in a specific namespace and Caddy runs in wireguard-system, a dedicated Certificate resource is created in wireguard-system - it cannot mount the secret from the nextcloud namespace.
Pi-hole's custom DNS routes nextcloud.<vars.domain> to 10.0.100.1 for VPN clients, so the VPN client's HTTPS request hits Caddy instead of nginx.
VPN user public keys are stored in SOPS and pushed to a Kubernetes Secret (wireguard-secrets) by the deployment script. The init container reads them at pod start and substitutes the placeholders in wg0.conf.
For managing VPN users, see WireGuard VPN.
Nextcloud
Namespace: nextcloud
Chart source: nixhelm / nextcloud
URL: https://nextcloud.<vars.domain>
Nextcloud is deployed with:
- PostgreSQL:
8GiLonghorn PVC - Redis: for session caching
- 100Gi Longhorn PVC: for file storage
- nginx ingress with cert-manager TLS
All credentials (admin password, DB password, Redis password) come from SOPS secrets pushed to Kubernetes Secrets before the chart deploys.
Nextcloud is configured to trust the k3s pod CIDR (10.42.0.0/16) as a proxy and to accept X-Forwarded-For headers. It also trusts the X-Remote-User header for auto-login when the request comes via the WireGuard caddy sidecar.
The Nextcloud SSO setup script (modules/system/k8s/scripts/nextcloud-sso.nix) runs after all charts are deployed to configure trusted domains and the remote user header.
Signal proxy
Namespace: signal-proxy
URL: https://signal.<vars.domain>
A Signal messenger proxy that lets Signal clients connect through your homelab instead of directly to Signal's servers. Useful for regions where Signal is blocked.
The ingress uses cert-manager TLS and the nginx ingress controller.
WireGuard VPN
nixlab includes a WireGuard VPN server running as a Kubernetes pod on the master node. It provides encrypted remote access to your homelab and optional automatic login to Nextcloud for VPN users.
How it works
The WireGuard pod runs two containers:
- wireguard: the VPN server on UDP port 51820 (
vars.wireguardIp) - caddy sidecar: an HTTPS proxy on TCP port 443 that injects
X-Remote-Userheaders for Nextcloud SSO
When a VPN user connects to Nextcloud from within the tunnel, Pi-hole resolves nextcloud.<vars.domain> to 10.0.100.1 (the WireGuard server's VPN IP) instead of the nginx ingress IP. The request hits Caddy, which identifies the user by their VPN IP, injects the X-Remote-User header with their Nextcloud username, and proxies to Nextcloud. Nextcloud trusts this header and logs the user in automatically.
LAN users (not on VPN) resolve nextcloud.<vars.domain> to the nginx ingress and go through normal authentication.
VPN subnet
The VPN uses 10.0.100.0/24:
10.0.100.1- WireGuard server (caddy sidecar also listens here)10.0.100.2and up - clients (assigned per user invars.wireguardUsers)
DNS for VPN clients is vars.piholeIp - Pi-hole blocks ads and resolves local hostnames for VPN users the same as LAN users.
Adding a user with add-wg-user.sh
The script at modules/system/sops/add-wg-user.sh automates the full onboarding flow:
cd nixlab
bash modules/system/sops/add-wg-user.sh <username>
What it does:
- Reads
vars.wireguardUsersvianix evalto find used IPs. - Picks the next free IP in
10.0.100.0/24. - Generates a WireGuard keypair with
wg genkey. - Stores the public and private keys in
modules/system/sops/secrets.yamlviasops --set. - Reads the server public key and endpoint from SOPS.
- Prints a ready-to-use client config and the
vars.nixsnippet to add.
Example output:
Client config for alice
============================
[Interface]
PrivateKey = <alice-private-key>
Address = 10.0.100.2/32
DNS = 192.168.1.250
[Peer]
PublicKey = <server-public-key>
Endpoint = <your-public-ip>:51820
AllowedIPs = 0.0.0.0/0
PersistentKeepalive = 25
Add to vars.nix wireguardUsers
============================
"alice" = {
ip = "10.0.100.2";
publicKeySecret = "alice_wg_public_key";
allowedIPs = "0.0.0.0/0";
enabled = true;
};
After running the script:
-
Paste the
vars.nixsnippet intowireguardUsers, addinggroup,description, and optionallynextcloudUser:wireguardUsers = { "alice" = { ip = "10.0.100.2"; group = "admin"; publicKeySecret = "alice_wg_public_key"; allowedIPs = "0.0.0.0/0"; nextcloudUser = "alice"; # optional: enables Nextcloud SSO description = "Alice - full access"; enabled = true; }; }; -
Deploy to the master:
colmena apply --on @masterThe activation script updates the WireGuard ConfigMap with the new peer. The
k8s-deployservice restarts the WireGuard pod to apply the new config. -
Send the client config to the user (the block printed by the script). The private key is already embedded - the user just imports it.
Retrieving a user's private key later
The private key is stored in SOPS. To recover it:
sops --decrypt --extract '["alice_wg_private_key"]' modules/system/sops/secrets.yaml
Disabling a user
Set enabled = false in vars.nix:
"alice" = {
...
enabled = false;
};
Deploy: colmena apply --on @master
The user's public key is removed from the WireGuard server config and their SOPS secret is unregistered from NixOS. The secret entry in secrets.yaml is left in place (to preserve the key if you re-enable the user).
Removing a user entirely
- Set
enabled = falseand deploy to confirm the peer is removed. - Remove the user's entry from
vars.wireguardUsersinvars.nix. - Remove their keys from
secrets.yaml:sops modules/system/sops/secrets.yaml # delete the alice_wg_public_key and alice_wg_private_key entries - Deploy:
colmena apply --on @master
Client setup
Linux (wg-quick)
# Save the config from add-wg-user.sh output as:
sudo mkdir -p /etc/wireguard
sudo nano /etc/wireguard/nixlab.conf # paste the [Interface] + [Peer] block
# Connect
sudo wg-quick up nixlab
# Disconnect
sudo wg-quick down nixlab
# Auto-start on boot
sudo systemctl enable wg-quick@nixlab
macOS
Install the WireGuard app from the App Store. Click the + button and import the config file (save the output of add-wg-user.sh as a .conf file).
iOS / Android
Install the WireGuard app from the App Store or Google Play. Use the QR code option - generate a QR code from the config on your workstation:
# Install qrencode
nix run nixpkgs#qrencode -- -t ansiutf8 < alice.conf
Or use the app's "Import from file" option.
Windows
Install WireGuard for Windows. Use "Import tunnel(s) from file" and select the .conf file.
Access groups
The group field in vars.wireguardUsers is a label - it doesn't currently enforce any network policy. It's intended for documentation and future use (e.g., network policies to restrict which cluster services different groups can reach).
Suggested conventions:
admin- full access, including Nextcloud SSOfamily- homelab services, limited external routingfriends- split tunnel, homelab access onlyguests- internet-only via VPN (no homelab access)
To implement network isolation between groups, add Kubernetes NetworkPolicy resources in the relevant namespaces based on the source VPN IP ranges for each group.
Nextcloud SSO mechanics
The caddy sidecar's Caddyfile is generated at build time from vars.wireguardUsers. For each user with nextcloudUser set, it generates a block like:
@alice remote_ip 10.0.100.2
handle @alice {
reverse_proxy http://nextcloud.nextcloud.svc.cluster.local:8080 {
header_up Host nextcloud.yourdomain.example.com
header_up X-Remote-User "alice"
}
}
Requests from unrecognized VPN IPs get a 403. VPN users without nextcloudUser set will receive a 403 when accessing Nextcloud over the VPN - they should use the normal LAN or internet path instead.
The Nextcloud Helm chart is configured with:
'trusted_proxies' => ['10.42.0.0/16'],
'forwarded_for_headers' => ['HTTP_X_FORWARDED_FOR'],
This makes Nextcloud trust the X-Remote-User header when it comes from the k3s pod CIDR - where caddy runs.
Adding a Kubernetes Chart
All Kubernetes services are defined as Nix files under modules/system/k8s/services/. There is no raw YAML in the repo - everything is rendered at build time by nix-kube-generators. Helm charts are sourced from nixhelm. This page walks through adding a new service from scratch.
The four-step process
- Create a service file under
services/<category>/ - Add a namespace to
vars.namespacesincharts.nix(if needed) - Import the file in
charts.nix - Add the chart name(s) to a deployment group in
default.nix
Step 1: Write the service file
Create modules/system/k8s/services/<category>/myapp.nix. Every service file has the same signature:
{ pkgs, inputs, lib, vars }:
And returns an attrset where each key is a chart name and each value is the result of lib.mkChart, lib.mkRawManifest, or lib.mkSecretRef.
Helm chart
Use lib.mkChart when there is a Helm chart available via nixhelm:
{ pkgs, inputs, lib, vars }:
{
myapp = lib.mkChart {
name = "myapp";
chart = lib.nixhelm.<org>.<chart>; # see nixhelm.charts for available charts
namespace = vars.namespaces.myapp;
values = {
replicaCount = 1;
service.type = "LoadBalancer";
service.loadBalancerIP = vars.ipPools.nginxExternal;
};
};
}
If you want to layer defaults with overrides:
let
defaults = { replicaCount = 1; resources.limits.memory = "256Mi"; };
overrides = { service.type = "ClusterIP"; };
finalValues = lib.overlayValues defaults overrides;
in {
myapp = lib.mkChart {
name = "myapp";
chart = lib.nixhelm.bitnami.myapp;
namespace = vars.namespaces.myapp;
values = finalValues;
};
}
Raw manifest
Use lib.mkRawManifest when you need to write Kubernetes resources directly as Nix attribute sets - no Helm chart involved:
{ pkgs, inputs, lib, vars }:
let
deploymentResource = {
apiVersion = "apps/v1";
kind = "Deployment";
metadata = {
name = "myapp";
namespace = vars.namespaces.myapp;
};
spec = {
replicas = 1;
selector.matchLabels = { app = "myapp"; };
template = {
metadata.labels = { app = "myapp"; };
spec.containers = [{
name = "myapp";
image = "myapp:latest";
ports = [{ containerPort = 8080; }];
}];
};
};
};
serviceResource = {
apiVersion = "v1";
kind = "Service";
metadata = { name = "myapp"; namespace = vars.namespaces.myapp; };
spec = {
selector = { app = "myapp"; };
type = "ClusterIP";
ports = [{ port = 80; targetPort = 8080; }];
};
};
in {
myapp = lib.mkRawManifest {
name = "myapp";
namespace = vars.namespaces.myapp;
resources = [ deploymentResource serviceResource ];
};
}
A single service file can return multiple chart keys. The deployment group in default.nix references them individually.
Secret reference
Use lib.mkSecretRef to inject a SOPS secret into a Kubernetes Secret that your workloads mount:
myapp-password = lib.mkSecretRef {
name = "myapp-password"; # chart key (must be unique across all services)
namespace = vars.namespaces.myapp;
secretName = "myapp-credentials"; # name of the Kubernetes Secret object
secretKey = "password"; # key inside the Secret
sopsSecretName = "myapp_password"; # key in secrets.yaml / sops/default.nix
};
At deploy time, k8s-deploy reads the decrypted value from the SOPS-managed file and patches it into the Kubernetes Secret. Multiple mkSecretRef entries can target the same secretName with different keys - patch-merge handles this without clobbering other keys.
The corresponding SOPS secret must be declared in modules/system/sops/default.nix:
sops.secrets.myapp_password = { owner = config.users.users.${username}.name; };
And added to modules/system/sops/secrets.yaml.example (and your encrypted secrets.yaml).
Step 2: Add a namespace
Open modules/system/k8s/charts.nix and add your namespace to vars.namespaces:
namespaces = {
# ...existing entries...
myapp = "myapp-system";
};
The deployment script creates all declared namespaces before applying any charts, so you don't need to create it manually.
Step 3: Import in charts.nix
Add your service file to the appropriate group in charts.nix. Pick the category that fits or add a new one:
appServices = {
signalProxy = import ./services/apps/signal-proxy.nix { inherit pkgs inputs lib vars; };
nextcloud = import ./services/apps/nextcloud.nix { inherit pkgs inputs lib vars; };
myapp = import ./services/apps/myapp.nix { inherit pkgs inputs lib vars; }; # added
};
If your service needs config (to read SOPS secret paths), pass it too:
myapp = import ./services/apps/myapp.nix { inherit pkgs inputs lib vars config; };
Step 4: Add to a deployment group
Open modules/system/k8s/default.nix and add the chart name(s) to an existing group or create a new one.
Adding to an existing group
{
name = "apps";
charts = [ "signal-proxy" "nextcloud" "myapp" ]; # added myapp
dependsOn = [ "core-config" "networking-services" "external-access" ];
waitFor = {
# ...existing waitFor entries...
myapp = {
kind = "deployment";
name = "myapp";
namespace = "myapp-system";
timeout = 120;
};
};
}
Creating a new group
If your service has different dependencies, add a new group in the right position in the list:
{
name = "myapp-services";
charts = [ "myapp" "myapp-password" ];
dependsOn = [ "core-config" "networking-services" ];
retryAttempts = 3; # default; optional
retryDelay = 10; # seconds between retries; optional
waitFor = {
myapp = {
kind = "deployment";
name = "myapp";
namespace = "myapp-system";
timeout = 180;
};
};
}
Group fields:
| Field | Required | Default | Description |
|---|---|---|---|
name | yes | - | Unique identifier; used for sentinel files under /var/lib/kubernetes/ |
charts | yes | - | Chart keys to deploy; must exist in regularCharts (i.e. mkChart or mkRawManifest, not mkSecretRef) |
dependsOn | no | [] | Group names that must have completed before this group runs |
waitFor | no | {} | Resources to wait for after deploying this group before proceeding |
retryAttempts | no | 3 | How many times to retry a failed kubectl apply |
retryDelay | no | 10 | Seconds to wait between retries |
waitFor values:
| Field | Description |
|---|---|
kind | "deployment" uses kubectl rollout status; anything else uses kubectl wait --for=condition=Available |
name | Resource name |
namespace | Resource namespace |
timeout | Seconds before giving up (warning only - deploy continues) |
The vars object
Every service file receives vars from charts.nix. The full set of available values:
vars = {
domain = "yourdomain.example.com"; # from vars.nix
upstreamDns = "192.168.1.1"; # from vars.nix
wireguardUsers = { ... }; # from vars.nix
namespaces = {
dns = "dns-system";
pihole = "pihole-system";
nginx = "nginx-system";
metallb = "metallb-system";
longhorn = "longhorn-system";
monitoring = "monitoring-system";
wireguard = "wireguard-system";
signalProxy = "signal-proxy";
nextcloud = "nextcloud";
# ...your additions
};
ipPools = {
metallb = "192.168.1.192/26";
nginxExternal = "192.168.1.193";
pihole = "192.168.1.250";
wireguard = "192.168.1.194";
};
piholeIp = "192.168.1.250"; # alias for ipPools.pihole
versions = { pihole = "2025.11.1"; }; # pinned image versions
defaultReplicas = 1;
tls = {
defaultIssuer = "letsencrypt-prod";
stagingIssuer = "letsencrypt-staging";
acmeServerProduction = "https://acme-v02.api.letsencrypt.org/directory";
acmeServerStaging = "https://acme-staging-v02.api.letsencrypt.org/directory";
};
};
The lib object
Functions available in service files:
| Function | Description |
|---|---|
lib.mkChart { name, chart, namespace, values } | Renders a Helm chart to YAML |
lib.mkRawManifest { name, namespace, resources } | Renders a list of Nix attrsets to a YAML stream |
lib.mkSecretRef { name, namespace, secretName, secretKey, sopsSecretName } | Injects a SOPS secret into a Kubernetes Secret |
lib.overlayValues defaults overrides | Deep-merges two attrsets, with overrides winning |
lib.nixhelm | All charts available via nixhelm - reference as lib.nixhelm.<org>.<chart> |
lib.kubelib | nix-kube-generators utilities (buildHelmChart, toYAMLStreamFile) |
Deploy
After making the changes:
# Check the flake evaluates
nix flake show
# Deploy to the master
colmena apply --on @master
The activation script (kubernetes-prepare) writes the rendered YAML to /var/lib/kubernetes/manifests/<chartname>.yaml and restarts k8s-deploy. The new chart is deployed in the order defined by its deployment group.
To watch the deployment live:
ssh youruser@master.local
sudo journalctl -fu k8s-deploy
Managing Secrets
All secrets are stored in modules/system/sops/secrets.yaml, encrypted with your age key via SOPS. The file is safe to commit - SOPS encryption means only the holder of the age private key can decrypt it. sops-nix decrypts the file at activation time and writes each secret to a tmpfs path that NixOS modules and the k8s deploy script can read.
Adding a new secret
Adding a secret requires three changes:
- Add the key to
secrets.yaml - Declare it in
modules/system/sops/default.nix - Reference the path in your module
1. Edit secrets.yaml
secrets.yaml is an encrypted YAML file. Open it in-place with SOPS - it decrypts to your editor, re-encrypts on save:
sops modules/system/sops/secrets.yaml
Add your key:
myapp_api_key: "the-actual-secret-value"
Save and close. SOPS re-encrypts immediately. If you have not yet encrypted the file (initial setup), fill it in plain text first and then encrypt:
sops --encrypt --in-place modules/system/sops/secrets.yaml
Also add a placeholder to modules/system/sops/secrets.yaml.example so future users know the key exists:
myapp_api_key: "your-myapp-api-key"
2. Declare in sops/default.nix
Open modules/system/sops/default.nix and add an entry inside sops.secrets:
sops.secrets = {
# ...existing secrets...
myapp_api_key = { owner = "${username}"; };
};
SOPS-nix will decrypt this key and write it to /run/secrets/myapp_api_key at boot.
Secret options
| Option | Default | Description |
|---|---|---|
owner | "root" | Unix user that owns the decrypted file |
group | "root" | Unix group that owns the decrypted file |
mode | "0400" | File permissions on the decrypted file |
path | /run/secrets/<name> | Override where the decrypted file is written |
neededForUsers | false | Set true for secrets used in users.users.<name>.hashedPasswordFile - decrypted before user activation |
restartUnits | [] | systemd units to restart when this secret changes |
Examples:
# Readable only by root (default)
myapp_api_key = {};
# Readable by a specific user
myapp_api_key = { owner = "${username}"; };
# Readable by all (e.g. a public key or non-sensitive config)
wireguard_server_public_key = { owner = "root"; mode = "0644"; };
# Written to a custom path (e.g. expected by a hardcoded service)
private_ssh_key = {
path = "/home/${username}/.ssh/id_ed25519";
mode = "0600";
owner = "${username}";
};
# User password - must be decrypted before users are activated
user_password = { neededForUsers = true; };
3. Reference in your module
Use config.sops.secrets.<name>.path to get the runtime path of the decrypted file:
{ config, ... }:
{
services.myapp = {
enable = true;
# Pass the path to the decrypted file, not the value itself
apiKeyFile = config.sops.secrets.myapp_api_key.path;
};
}
Or read it inline in a shell script (e.g. inside a systemd ExecStart):
systemd.services.myapp = {
script = ''
API_KEY=$(cat ${config.sops.secrets.myapp_api_key.path})
exec myapp --api-key "$API_KEY"
'';
};
Injecting a secret into a Kubernetes Secret
For secrets used by Kubernetes workloads, use lib.mkSecretRef in the service file instead of referencing config.sops.secrets directly. The k8s-deploy script reads the decrypted file and patches it into a Kubernetes Secret object.
See Adding a Chart - Secret reference for the full workflow.
Rotating a secret
- Open the file:
sops modules/system/sops/secrets.yaml - Change the value
- Save - SOPS re-encrypts
- Redeploy:
colmena apply
SOPS-nix detects the changed secret and restarts any units listed in restartUnits for that secret. Services that read the path at startup (not at module load) pick up the new value automatically on restart.
Re-keying (replacing the age key)
If you need to rotate the age key itself:
# Generate a new key
age-keygen -o ~/.config/sops/age/keys.txt.new
# Update .sops.yaml with the new public key, then re-encrypt
sops updatekeys modules/system/sops/secrets.yaml
# Remove the old key
mv ~/.config/sops/age/keys.txt.new ~/.config/sops/age/keys.txt
Update /persist/etc/sops-nix/keys.txt on each node with the new private key, then redeploy.