While I’ve self-hosted for many years, I started my kubernetes journey with the aim of avoiding single points of failure. For my first foray into kubernetes, I started with microk8s but was never completely happy with the setup. I don’t know if the Raspberry Pis were underpowered or the microk8s use of dqlite, but there were times the cluster became unresponsive for no obvious reason. Time for a change. Jumping to end of the story: Talos has been rock solid since the move.
Talos is a container optimized Linux distro; a reimagining of Linux for distributed systems such as Kubernetes. Designed to be as minimal as possible while still maintaining practicality. For example, there is no ssh access to the cluster. Instead, cluster interaction is through an API and command-line tool, called talosctl
.
To configure the Talos cluster, I used talhelper. Talhelper allows the cluster configuration to be captured in a single configuration file, called talconfig.yaml
. This file will contain the information for each machine in the cluster, at a minimum:
Many optional parameters also allow the configuration of each machine to be customised. For example, you can configure VIPs, routes, additional disks, Talos extensions, patches for Talos configuration, etc. There is a template talconfig.yaml on the talhelper website. I used this as a reference, only adding the paramters I needed.
Two notes:
The configuration file looks like this:
# Template file
# https://github.com/budimanjojo/talhelper/blob/master/example/talconfig.yaml
clusterName: home-cluster
talosVersion: v1.9.4
kubernetesVersion: v1.32.2
endpoint: https://192.168.1.2:6443
# enable workers on your control plane nodes
allowSchedulingOnControlPlanes: false
cniConfig:
name: none
clusterPodNets:
- 10.244.0.0/16
clusterSvcNets:
- 10.96.0.0/12
# patches:
# - |-
# # Patches go here
nodes:
- hostname: cp1
controlPlane: true
ipAddress: 192.168.1.1
installDisk: /dev/sda
nameservers:
- 192.168.0.11
# Set up the control plane nodes to provide a vip for the kubernetes api server
# Documentation @ https://www.talos.dev/v1.9/talos-guides/network/vip/
# Default setup is to use predictable interface names: https://www.talos.dev/v1.9/talos-guides/network/predictable-interface-names/
# Different devices might need a different interface name: https://www.talos.dev/v1.9/talos-guides/network/device-selector/
networkInterfaces:
- interface: end0
dhcp: true
vip:
ip: 192.168.1.9
- hostname: cp2
controlPlane: true
ipAddress: 192.168.1.2
installDisk: /dev/sda
nameservers:
- 192.168.0.11
# Set up the control plane nodes to provide a vip for the kubernetes api server
networkInterfaces:
- interface: end0
dhcp: true
vip:
ip: 192.168.1.9
- hostname: cp3
controlPlane: true
ipAddress: 192.168.1.3
installDisk: /dev/sda
nameservers:
- 192.168.0.11
# Set up the control plane nodes to provide a vip for the kubernetes api server
networkInterfaces:
- interface: end0
dhcp: true
vip:
ip: 192.168.1.9
## Worker Nodes
# - nodeTaints cannot be set by the worker node itself
# https://github.com/siderolabs/talos/discussions/9895
# - nodeLabels prefixed with kubernetes.io and others cannot be set by the worker node itself
# https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction
# https://github.com/siderolabs/talos/issues/6750
# - confirm node disk configuration with
# talosctl get disks -n 192.168.1.21 --endpoints 192.168.1.1 --talosconfig=./clusterconfig/talosconfig --insecure
- hostname: w1
controlPlane: false
ipAddress: 192.168.1.21
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: w2
controlPlane: false
ipAddress: 192.168.1.22
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: w3
controlPlane: false
ipAddress: 192.168.1.23
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: w4
controlPlane: false
ipAddress: 192.168.1.24
installDisk: /dev/sda
nameservers:
- 192.168.0.11
## Storage Nodes - Rook Ceph
- hostname: ceph1
controlPlane: false
ipAddress: 192.168.1.11
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: ceph2
controlPlane: false
ipAddress: 192.168.1.12
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: ceph3
controlPlane: false
ipAddress: 192.168.1.13
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: ceph4
controlPlane: false
ipAddress: 192.168.1.14
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: ceph5
controlPlane: false
ipAddress: 192.168.1.15
installDisk: /dev/sda
nameservers:
- 192.168.0.11
- hostname: ceph6
controlPlane: false
ipAddress: 192.168.1.16
installDisk: /dev/sda
nameservers:
- 192.168.0.11
# controlPlane:
# patches:
# - |-
# # Patches go here
# worker:
# patches:
# - |-
# # Patches go here
Breaking this down, we first define the cluster. I’m not installing a CNI at this point but do call out the default network ranges manually - mainly for documentation so we can reference these value later. I also prevent workloads being scheduled on the control plane nodes. Again, this is the default but good to call out none-the-less.
clusterName: home-cluster
talosVersion: v1.9.4
kubernetesVersion: v1.32.2
endpoint: https://192.168.1.2:6443
# enable workers on your control plane nodes
allowSchedulingOnControlPlanes: false
cniConfig:
name: none
clusterPodNets:
- 10.244.0.0/16
clusterSvcNets:
- 10.96.0.0/12
The next section covers two types of nodes - control plane nodes and worker nodes. Actually, there are also two types of workers…. True workers and the rook-ceph storage nodes.
These are dedicated Raspberry Pi 4Bs with 8GB RAM and an SSD attached via USB.
nodes:
- hostname: cp1
controlPlane: true
ipAddress: 192.168.1.1
installDisk: /dev/sda
nameservers:
- 192.168.0.11
# Set up the raspberry pi to provide a vip for the kubernetes api server
networkInterfaces:
- interface: end0
dhcp: true
vip:
ip: 192.168.1.9
The configuration is pretty self-explanatory. The nodes all pick up a static IP address with DHCP and Talos configures a “Virtual” IP (VIP) address to access the Kubernetes API server, providing high availability with no other resources required. The controlplane machines vie for control of the shared IP address using etcd elections. There can be only one owner of the IP address at any given time - if that owner disappears or becomes non-responsive, another owner will be chosen, and it will take up the IP address.
Note: we don’t use the VIP address for the to create the cluster because the VIP is only active once the cluster is up and running.
These are used HP EliteDesk G2 Desktop Mini PC i5-6500 3.6GHz 16GB RAM. Bought used from eBay with new SSD disks, I’ve been really impressed with these.
- hostname: w1
controlPlane: false
ipAddress: 192.168.1.21
installDisk: /dev/sda
nameservers:
- 192.168.0.11
Hardware-wise, these are the same are the Worker nodes but have an additional NVME drive each that provides a disk for Ceph.
- hostname: ceph1
controlPlane: false
ipAddress: 192.168.1.11
installDisk: /dev/sda
nameservers:
- 192.168.0.11
First gather the tools we need:
Get talosctl from https://github.com/siderolabs/talos/releases
wget https://github.com/siderolabs/talos/releases/download/v1.9.4/talosctl-linux-amd64
sudo install talosctl-linux-amd64 /usr/local/bin/talosctl
Get kubectl from https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install kubectl /usr/local/bin/kubectl
https://budimanjojo.github.io/talhelper/latest/
wget https://github.com/budimanjojo/talhelper/releases/download/v3.0.20/talhelper_linux_amd64.tar.gz
tar -xvf talhelper_linux_amd64.tar.gz
sudo install talhelper /usr/local/bin/talhelper
rm LICENSE README.md
Once we have the talconfig.yaml
, there are only a couple of steps to create the cluster.
# Warning do not regenerate secrets on an existing cluster..
talhelper gensecret > talsecret.sops.yaml
# and encrypt it with sops
sops -e -i talsecret.sops.yaml
Create the node configuration files:
talhelper genconfig
This will create a clusterconfig
directory with a yaml file containing the configuration for each node. talhelper
will automatically generate a .gitignore file to prevent pushing the files containing keys to a Git repository. In my case, these files are created:
generated config for cp1 in ./clusterconfig/home-cluster-cp1.yaml
generated config for cp2 in ./clusterconfig/home-cluster-cp2.yaml
generated config for cp3 in ./clusterconfig/home-cluster-cp3.yaml
generated config for w1 in ./clusterconfig/home-cluster-w1.yaml
generated config for w2 in ./clusterconfig/home-cluster-w2.yaml
generated config for w3 in ./clusterconfig/home-cluster-w3.yaml
generated config for w4 in ./clusterconfig/home-cluster-w4.yaml
generated config for ceph1 in ./clusterconfig/home-cluster-ceph1.yaml
generated config for ceph2 in ./clusterconfig/home-cluster-ceph2.yaml
generated config for ceph3 in ./clusterconfig/home-cluster-ceph3.yaml
generated config for ceph4 in ./clusterconfig/home-cluster-ceph4.yaml
generated config for ceph5 in ./clusterconfig/home-cluster-ceph5.yaml
generated config for ceph6 in ./clusterconfig/home-cluster-ceph6.yaml
generated client config in ./clusterconfig/talosconfig
generated .gitignore file in ./clusterconfig/.gitignore
Generate and apply the configuration:
talhelper gencommand apply --extra-flags --insecure
This creates the commands - we can run these individually and monitor the results, or all at once!
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.1 --file=./clusterconfig/home-cluster-cp1.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.2 --file=./clusterconfig/home-cluster-cp2.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.3 --file=./clusterconfig/home-cluster-cp3.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.21 --file=./clusterconfig/home-cluster-w1.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.22 --file=./clusterconfig/home-cluster-w2.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.23 --file=./clusterconfig/home-cluster-w3.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.24 --file=./clusterconfig/home-cluster-w4.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.11 --file=./clusterconfig/home-cluster-ceph1.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.12 --file=./clusterconfig/home-cluster-ceph2.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.13 --file=./clusterconfig/home-cluster-ceph3.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.14 --file=./clusterconfig/home-cluster-ceph4.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.15 --file=./clusterconfig/home-cluster-ceph5.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.16 --file=./clusterconfig/home-cluster-ceph6.yaml --insecure;
Note: The --insecure
flag allows applying without specifying an encryption key, which is mandatory when the node is not yet installed. By default, talhelper
does not include it in the generated commands, the --extra-flags
argument adds this flag.
Before the cluster can form it needs to be bootstrapped - I applied the node configuration to the first node and then bootstrapped the cluster. Again talhelper gives the command we need:
talhelper gencommand apply --extra-flags --insecure
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.1 --file=./clusterconfig/home-cluster-cp1.yaml;
talhelper gencommand bootstrap
talosctl bootstrap --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.1;
talosctl --nodes 192.168.1.2 --endpoints 192.168.1.2 --talosconfig=./clusterconfig/talosconfig health
talosctl --nodes 192.168.1.2 --endpoints 192.168.1.2 --talosconfig=./clusterconfig/talosconfig dashboard
talosctl --talosconfig=./clusterconfig/talosconfig get members
talosctl
can also generate the kubeconfig
file needed to access the cluster
talosctl kubeconfig --talosconfig=./clusterconfig/talosconfig --nodes 192.168.1.1
Now we can use kubectl
:
kubectl get node
Calico is a networking and security solution that enables Kubernetes workloads and non-Kubernetes/legacy workloads to communicate seamlessly and securely. I found a few blog posts for setting up Cillium but not Calico. Here are the Calico instuctions.
Download the Calico CNI Manefest:
wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/calico.yaml
Search for CALICO_IPV4POOL_CIDR
and set to:
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"
Note: Changing this value after installation will have no effect
kubectl create -f calico/calico.yaml
That’s it, once the Calico pods spin up, the remaining node configurations can be applied without the --insecure
flag and we’ll have a fully working cluster!
Thanks for reading!