Setting up a Talos kubernetes cluster with talhelper



While I’ve self-hosted for many years, I started my kubernetes journey with the aim of avoiding single points of failure. For my first foray into kubernetes, I started with microk8s but was never completely happy with the setup. I don’t know if the Raspberry Pis were underpowered or the microk8s use of dqlite, but there were times the cluster became unresponsive for no obvious reason. Time for a change. Jumping to end of the story: Talos has been rock solid since the move.

Talos

Talos is a container optimized Linux distro; a reimagining of Linux for distributed systems such as Kubernetes. Designed to be as minimal as possible while still maintaining practicality. For example, there is no ssh access to the cluster. Instead, cluster interaction is through an API and command-line tool, called talosctl.

Configuring a Talos cluster with Talhelper

To configure the Talos cluster, I used talhelper. Talhelper allows the cluster configuration to be captured in a single configuration file, called talconfig.yaml. This file will contain the information for each machine in the cluster, at a minimum:

Many optional parameters also allow the configuration of each machine to be customised. For example, you can configure VIPs, routes, additional disks, Talos extensions, patches for Talos configuration, etc. There is a template talconfig.yaml on the talhelper website. I used this as a reference, only adding the paramters I needed.

Two notes:

  1. nodeTaints cannot be set by the worker node itself
  2. nodeLabels with prefixes wuch as kubernetes.io cannot be set by the worker node itself

The configuration file looks like this:

# Template file
# https://github.com/budimanjojo/talhelper/blob/master/example/talconfig.yaml
clusterName: home-cluster
talosVersion: v1.9.4
kubernetesVersion: v1.32.2
endpoint: https://192.168.1.2:6443
# enable workers on your control plane nodes
allowSchedulingOnControlPlanes: false
cniConfig:
  name: none
clusterPodNets:
  - 10.244.0.0/16
clusterSvcNets:
  - 10.96.0.0/12
# patches:
#   - |-
#     # Patches go here

nodes:
  - hostname: cp1
    controlPlane: true
    ipAddress: 192.168.1.1
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
    # Set up the control plane nodes to provide a vip for the kubernetes api server
    # Documentation @ https://www.talos.dev/v1.9/talos-guides/network/vip/
    # Default setup is to use predictable interface names: https://www.talos.dev/v1.9/talos-guides/network/predictable-interface-names/
    # Different devices might need a different interface name: https://www.talos.dev/v1.9/talos-guides/network/device-selector/
    networkInterfaces:
      - interface: end0
        dhcp: true
        vip:
          ip: 192.168.1.9
  - hostname: cp2
    controlPlane: true
    ipAddress: 192.168.1.2
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
    # Set up the control plane nodes to provide a vip for the kubernetes api server
    networkInterfaces:
      - interface: end0
        dhcp: true
        vip:
          ip: 192.168.1.9
  - hostname: cp3
    controlPlane: true
    ipAddress: 192.168.1.3
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
    # Set up the control plane nodes to provide a vip for the kubernetes api server
    networkInterfaces:
      - interface: end0
        dhcp: true
        vip:
          ip: 192.168.1.9

  ## Worker Nodes
  # - nodeTaints cannot be set by the worker node itself
  #   https://github.com/siderolabs/talos/discussions/9895
  # - nodeLabels prefixed with kubernetes.io and others cannot be set by the worker node itself
  #   https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction
  #   https://github.com/siderolabs/talos/issues/6750
  # - confirm node disk configuration with
  #   talosctl get disks -n 192.168.1.21 --endpoints 192.168.1.1 --talosconfig=./clusterconfig/talosconfig --insecure
  - hostname: w1
    controlPlane: false
    ipAddress: 192.168.1.21
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: w2
    controlPlane: false
    ipAddress: 192.168.1.22
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: w3
    controlPlane: false
    ipAddress: 192.168.1.23
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: w4
    controlPlane: false
    ipAddress: 192.168.1.24
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11

  ## Storage Nodes - Rook Ceph
  - hostname: ceph1
    controlPlane: false
    ipAddress: 192.168.1.11
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: ceph2
    controlPlane: false
    ipAddress: 192.168.1.12
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: ceph3
    controlPlane: false
    ipAddress: 192.168.1.13
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: ceph4
    controlPlane: false
    ipAddress: 192.168.1.14
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: ceph5
    controlPlane: false
    ipAddress: 192.168.1.15
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
  - hostname: ceph6
    controlPlane: false
    ipAddress: 192.168.1.16
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11

# controlPlane:
#   patches:
#     - |-
#       # Patches go here

# worker:
#   patches:
#     - |-
#       # Patches go here

Breaking this down, we first define the cluster. I’m not installing a CNI at this point but do call out the default network ranges manually - mainly for documentation so we can reference these value later. I also prevent workloads being scheduled on the control plane nodes. Again, this is the default but good to call out none-the-less.

clusterName: home-cluster
talosVersion: v1.9.4
kubernetesVersion: v1.32.2
endpoint: https://192.168.1.2:6443
# enable workers on your control plane nodes
allowSchedulingOnControlPlanes: false
cniConfig:
  name: none
clusterPodNets:
  - 10.244.0.0/16
clusterSvcNets:
  - 10.96.0.0/12

The next section covers two types of nodes - control plane nodes and worker nodes. Actually, there are also two types of workers…. True workers and the rook-ceph storage nodes.

Typical Control Plane Node

These are dedicated Raspberry Pi 4Bs with 8GB RAM and an SSD attached via USB.

nodes:
  - hostname: cp1
    controlPlane: true
    ipAddress: 192.168.1.1
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11
    # Set up the raspberry pi to provide a vip for the kubernetes api server
    networkInterfaces:
      - interface: end0
        dhcp: true
        vip:
          ip: 192.168.1.9

The configuration is pretty self-explanatory. The nodes all pick up a static IP address with DHCP and Talos configures a “Virtual” IP (VIP) address to access the Kubernetes API server, providing high availability with no other resources required. The controlplane machines vie for control of the shared IP address using etcd elections. There can be only one owner of the IP address at any given time - if that owner disappears or becomes non-responsive, another owner will be chosen, and it will take up the IP address.

Note: we don’t use the VIP address for the to create the cluster because the VIP is only active once the cluster is up and running.

Typical Worker node

These are used HP EliteDesk G2 Desktop Mini PC i5-6500 3.6GHz 16GB RAM. Bought used from eBay with new SSD disks, I’ve been really impressed with these.

  - hostname: w1
    controlPlane: false
    ipAddress: 192.168.1.21
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11

Typical Rook-Ceph Storage Node

Hardware-wise, these are the same are the Worker nodes but have an additional NVME drive each that provides a disk for Ceph.

  - hostname: ceph1
    controlPlane: false
    ipAddress: 192.168.1.11
    installDisk: /dev/sda
    nameservers:
      - 192.168.0.11

Install talosctl, kubectl and talhelper

First gather the tools we need:

Install talosctl

Get talosctl from https://github.com/siderolabs/talos/releases

wget https://github.com/siderolabs/talos/releases/download/v1.9.4/talosctl-linux-amd64
sudo install talosctl-linux-amd64 /usr/local/bin/talosctl 

Install kubectl

Get kubectl from https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install kubectl /usr/local/bin/kubectl

Install Talhelper

https://budimanjojo.github.io/talhelper/latest/

wget https://github.com/budimanjojo/talhelper/releases/download/v3.0.20/talhelper_linux_amd64.tar.gz
tar -xvf talhelper_linux_amd64.tar.gz
sudo install talhelper /usr/local/bin/talhelper
rm LICENSE README.md

Creating the cluster

Once we have the talconfig.yaml, there are only a couple of steps to create the cluster.

# Warning do not regenerate secrets on an existing cluster..
talhelper gensecret > talsecret.sops.yaml
# and encrypt it with sops
sops -e -i talsecret.sops.yaml

Create the node configuration files:

talhelper genconfig

This will create a clusterconfig directory with a yaml file containing the configuration for each node. talhelper will automatically generate a .gitignore file to prevent pushing the files containing keys to a Git repository. In my case, these files are created:

generated config for cp1 in ./clusterconfig/home-cluster-cp1.yaml
generated config for cp2 in ./clusterconfig/home-cluster-cp2.yaml
generated config for cp3 in ./clusterconfig/home-cluster-cp3.yaml
generated config for w1 in ./clusterconfig/home-cluster-w1.yaml
generated config for w2 in ./clusterconfig/home-cluster-w2.yaml
generated config for w3 in ./clusterconfig/home-cluster-w3.yaml
generated config for w4 in ./clusterconfig/home-cluster-w4.yaml
generated config for ceph1 in ./clusterconfig/home-cluster-ceph1.yaml
generated config for ceph2 in ./clusterconfig/home-cluster-ceph2.yaml
generated config for ceph3 in ./clusterconfig/home-cluster-ceph3.yaml
generated config for ceph4 in ./clusterconfig/home-cluster-ceph4.yaml
generated config for ceph5 in ./clusterconfig/home-cluster-ceph5.yaml
generated config for ceph6 in ./clusterconfig/home-cluster-ceph6.yaml
generated client config in ./clusterconfig/talosconfig
generated .gitignore file in ./clusterconfig/.gitignore

Generate and apply the configuration:

talhelper gencommand apply --extra-flags --insecure

This creates the commands - we can run these individually and monitor the results, or all at once!

talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.1 --file=./clusterconfig/home-cluster-cp1.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.2 --file=./clusterconfig/home-cluster-cp2.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.3 --file=./clusterconfig/home-cluster-cp3.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.21 --file=./clusterconfig/home-cluster-w1.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.22 --file=./clusterconfig/home-cluster-w2.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.23 --file=./clusterconfig/home-cluster-w3.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.24 --file=./clusterconfig/home-cluster-w4.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.11 --file=./clusterconfig/home-cluster-ceph1.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.12 --file=./clusterconfig/home-cluster-ceph2.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.13 --file=./clusterconfig/home-cluster-ceph3.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.14 --file=./clusterconfig/home-cluster-ceph4.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.15 --file=./clusterconfig/home-cluster-ceph5.yaml --insecure;
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.16 --file=./clusterconfig/home-cluster-ceph6.yaml --insecure;

Note: The --insecure flag allows applying without specifying an encryption key, which is mandatory when the node is not yet installed. By default, talhelper does not include it in the generated commands, the --extra-flags argument adds this flag.

Bootstrap the cluster

Before the cluster can form it needs to be bootstrapped - I applied the node configuration to the first node and then bootstrapped the cluster. Again talhelper gives the command we need:

talhelper gencommand apply --extra-flags --insecure
talosctl apply-config --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.1 --file=./clusterconfig/home-cluster-cp1.yaml;
talhelper gencommand bootstrap
talosctl bootstrap --talosconfig=./clusterconfig/talosconfig --nodes=192.168.1.1;

Useful commands to check talos cluster status

Health Logs

talosctl --nodes 192.168.1.2 --endpoints 192.168.1.2 --talosconfig=./clusterconfig/talosconfig health

Dashboard

talosctl --nodes 192.168.1.2 --endpoints 192.168.1.2 --talosconfig=./clusterconfig/talosconfig dashboard

etcd members

talosctl --talosconfig=./clusterconfig/talosconfig get members

Configuring kubectl

talosctl can also generate the kubeconfig file needed to access the cluster

talosctl kubeconfig --talosconfig=./clusterconfig/talosconfig --nodes 192.168.1.1

Now we can use kubectl:

kubectl get node

Adding Calico as a CNI

Calico is a networking and security solution that enables Kubernetes workloads and non-Kubernetes/legacy workloads to communicate seamlessly and securely. I found a few blog posts for setting up Cillium but not Calico. Here are the Calico instuctions.

Download the Calico CNI Manefest:

wget https://raw.githubusercontent.com/projectcalico/calico/v3.29.2/manifests/calico.yaml

Search for CALICO_IPV4POOL_CIDR and set to:

            - name: CALICO_IPV4POOL_CIDR
              value: "10.244.0.0/16"

Note: Changing this value after installation will have no effect

kubectl create -f calico/calico.yaml

That’s it, once the Calico pods spin up, the remaining node configurations can be applied without the --insecure flag and we’ll have a fully working cluster!

Thanks for reading!