No description
Find a file
2026-03-11 12:49:54 +00:00
ansible chore: 🤖 bump cf ddns 2026-03-11 12:49:54 +00:00
images chore: 🤖 diagram 2025-06-07 22:15:56 +01:00
talos/patches fix: 🐛 get apps back up and running 2026-03-01 12:19:58 +00:00
.gitignore refactor: 💡 redeploy all core apps and cluster os 2026-02-28 23:27:18 +00:00
LICENSE Initial commit 2025-03-30 13:30:47 +00:00
makefile refactor: 💡 roles so can apply individual apps 2025-09-24 14:50:58 +01:00
README.md chore: 🤖 remove version checker from monitoring 2026-03-11 09:48:01 +00:00
renovate.json test: 💍 wip 2026-03-10 09:07:33 +00:00

homelab

Talos cluster

The repo contains the ansible code to init the actual k8s cluster and the apps inside that cluster. The ansible is designed to be idempotent and therefore you can repeat the command multiple times against a cluster and it will only bring the cluster up if it isn't already. There are 2 phases:

  1. init the cluster infra
  2. bring up the apps a. core apps b. user apps

Physical setup

If the cluster gets into an irreparable state, you may need to reinstall the os. This isn't as daunting as it sounds as ansible brings up the cluster fairly smoothly to previous parity (not the same exact state). This works because the vast majority of the persistent state is stored on the nas.

To run the infra playbook, you need to have 3 nodes running in talos maintenance mode (booted from a live usb). Here are the steps to get to that state (takes ~30 mins a node):

  • wipe the previous os if it was talos, to do that you need to install a non talos os.
    • boot and spam F2 to reach the BIOS
    • turn off secure boot and increase the priority of your live usb
    • install the non talos os
  • once the non talos os is installed then reboot with the talos os live usb inserted
  • spam F2 again to enter the BIOS and raise the priority of the talos os live usb and turn back on secure boot
  • do not leave your computer at this point, you want to wait for a grub like menu where you can select to "enroll secure boot keys" for talos os
  • once the keys are enrolled, you can then boot into the talos os live usb, at this point it's running in ram and you can remove the usb. At this point talos is in maintenance mode and ready to be configured.
  • repeat this process until all nodes are in the maintenance state

Note: when returning the optiplexes to the rack make sure to insert them from the back of the rack or risk snapping bits of plastic off the case.

Steps to run before running the init cluster infra step

  • rm -rf talos/rendered/
  • rm ~/.talos/config

Command

make infra

Init core apps

prerequisites create a secret_vars.yaml with an 'email_addr' value and create a cloudflare_config.json with the relevant values. I've cloned the cloudflare ddns repo so that I don't have to pin the docker image to "latest" as the original repo does.

  • metal lb
  • nginx ingress controller
  • cert manager
  • blocky (dns proxy and ad blocker)
  • wire guard
  • dynamic dns
  • nfs storage csi
  • kube-prometheus-stack
  • loki-stack
  • version-checker

To run ansible that uses the kubernetes community package you need to run the ansible commands from a python venv:

. venv/homelab/bin/activate

make core-apps

make all-apps

make app app=<app-name> to make a single app

Core explained

home server network diagram

Metallb provides 3 ips that it load balances from.

  • 192.168.222.197 - the cluster load balancer for apps etc
  • 192.168.222.198 - wireguard server
  • 192.168.222.199 - load balancer used for internal network traffic to access the blocky dns server
  • 192.168.222.200 - talos control plane vip

Blocky is the network wide dns server, all traffic on the vpn or local networks resolves dns via blocky. It also serves as an ad blocker by dns sink holing any unwanted traffic. We run 3 replicas and a redis server to store a cache of dns queries across the instances. Blocky is also configured to use dns over tls (DoT), this encrypts dns packets so actors, especially our ISP can't snoop.

Wireguard is our vpn and allows users to tunnel safely and securely to inside our network. Once we are directed to the wireguard app via the router we are then pointed to the correct app via our blocky dns server, which will redirect to the cluster ip.

External or non-local traffic flows into the router from the vpn and from the vpn to blocky and onto which app they request in the cluster.

Internal or local traffic flows from inside the network via blocky and then out to the internet or whatever local service it needs.

Monitoring

The monitoring namespace contains kube-prometheus-stack which sets up prometheus, grafana and alertmanager. It also contains the loki-stack chart which deploys loki for logging. It's important to note, that this stack is no longer supported and cannot push the loki version past 2.9.3. Find the loki dashboard here

Init user apps

mealie

Contains ~3000 recipes scraped programmatically. The mealie app doesn't appear in version checker because it is hosted in github, so I need to check that manually. There is a cron job in k8s that triggers the db backup at 1am the 1st of each month, the backup is then pushed to a bucket at 3am on the 2nd of every month, triggered by a cron in TrueNas.

actual-budget

The actual budget server keeps in sync data from the clients. The server and the client should be the same version. Backups are taken on the client device.

photo-prism

Photos are pushed from the client devices (when they are on charge) to the photo prism instance. Photoprism has a mariadb for metadata and the actual images are kept on the nfs (TrueNas). If you ever get in trouble with the mariadb instance which has a pvc, don't hesitate to blast the pvc and restart the mariadb instance, then you can restore the db from the nfs backup found photo_prism/storage/backup/mysql. Backups are pushed to a bucket nightly (via TrueNas sync).

joplin-server

Joplin server keeps in sync notes across devices. The server uses a postgres instance to store the data, currently backups are handled on the client.

kanboard

Data is stored on the nfs. Data is split into plugins, board data and certs (certs aren't really relevant here because I use nginx ingress). The entire kandboard folder is fairly small and pushed up to a bucket nightly, the db is found kanboard/data/db.sqlite.

sonarr, radarr, prowlarr, flaresolverr, rclone and jellyfin

These apps are found in the arrs namespace and work by sharing volume mounts and connecting to an externally hosted seed box. Files are then rclone mounted to a shared direction and worked on by the arr apps

find-my-device (fmd)

Open source self-hosted device tracker, allows you to to locate, ring, wipe and issue other commands to your device when it's lost. It aims to be a secure open source alternative to Google's Find My Device. server source and android app source. Uses ntfy to push notifications.

ntfy

Server pushes notifications to the find my device phone app website.

readeck

A nice platform to save, read, tag and highlight online articles. Useful for understanding an article and then when you revist it you can read through the highlights rather than the whole article.

Upgrading the cluster

There are 3 components we need to update:

  • talosctl (client and node)
  • kubectl (client)
  • kubernetes

You need to upgrade to the latest version of talosctl before jumping to the next version. You need to also recreate the image over at https://factory.talos.dev/ with the following values:

# ---
# customization:
#   extraKernelArguments:
#     - net.ifnames=0
#   systemExtensions:
#     officialExtensions:
#     - siderolabs/i915
#     - siderolabs/intel-ice-firmware
#     - siderolabs/intel-ucode

Then follow these steps:

  1. talosctl -n 192.168.222.200 etcd snapshot etcd.backup
  2. Point your kubeconfig away from the vip (192,168.222.200 -> 192.168.222.201)
  3. talosctl -n 192.168.222.201 upgrade --image factory.talos.dev/metal-installer-secureboot/c36e41ab205d24b3cc7c3aed91950af51ce00cb0f90429fc141888e64f1568d6:v1.xx.x upgrade each control node
  4. Monitor the pods on each of the nodes kubectl get pods -A --field-selector spec.nodeName=opti-1
  5. Once the node has rebooted and the pods have all come up again repeat working through all of the nodes
  6. Now all the nodes are on the latest versions, upgrade your local kubectl and talosctl then you can start the kubernetes upgrade
  7. talosctl --nodes 192.168.222.201 upgrade-k8s --to <next_k8s_version> this will update the entire cluster
  8. Finally, you should reinstate the vip in your kubeconfig