Table of Contents
In this article, I will take you through the steps to backup and restore Kubernetes ETCD database but before that let's try to understand what is etcd and why it is important to backup etcd database. According to official documentation, etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.
Now the second question why it is important to backup etcd database, it is simply because it contains the entire Kubernetes Cluster data. The snapshot file contains all the Kubernetes states and critical information. There are basically two ways to backup etcd cluster - etcd built-in snapshot and volume snapshot. Here we will use the etcd built in snapshot feature to take the backup.
How to Backup and Restore Kubernetes ETCD database
Also Read: How to take a Kubernetes Cluster node out for Maintenance
Step 1: Check ETCD version
Before starting the backup always verify the ETCD version by using kubectl -n kube-system describe po etcd-master | grep Image command. Here etc-master is our etcd pod. This might be different for you. So change this name to the appropriate one.
root@cyberithub:~# kubectl -n kube-system describe po etcd-master | grep Image
Image: k8s.gcr.io/etcd:3.4.13-0
Image ID: docker-pullable://k8s.gcr.io/etcd@sha256:4ad90a11b55313b182afc186b9876c8e891531b8db4c9bf1541953021618d0e2
Step 2: Take Backup
Once etcd version is noted, it is time to save the snapshot at a particular location by using below etcdctl snapshot save
command. You can also notice we are passing arguments like ca cert path, server cert path, key path and endpoints along with the snapshot save command. Without specifying all these details, snapshot cannot be taken. So please provide all the information correctly.
root@cyberithub:~# ETCDCTL_API=3 etcdctl snapshot save --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 /opt/snapshot-data.db
Snapshot saved at /opt/snapshot-data.db
Below are the list of arguments you need to pass with the above backup command :-
--cacert: CA certificate path
--cert: Server certificate path
--key: Server Key
--endpoint: Endpoint URL
Step 3: Check the Application
Now let's say some disaster occurred and all your application is gone. If you try to check any pods, deployments or services using kubectl get pods,deployments,svc command then all you see is the below output.
root@cyberithub:~# kubectl get pods,deployments,svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.88.0.1 <none> 443/TCP 85s
Step 4: Restore Backup
To restore the Cluster data, we need to use etcdctl snapshot restore
command to restore data from the backup we have taken earlier. You can also notice the arguments we are passing through etcdctl snapshot restore
command that needs to be given correctly.
root@cyberithub:~# ETCDCTL_API=3 etcdctl snapshot restore --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 --data-dir="/var/lib/etcdbkp" --initial-cluster="master=https://127.0.0.1:2380" --name="master" --initial-advertise-peer-urls="https://127.0.0.1:2380" /opt/snapshot-data.db
2022-01-10 06:17:11.995906 I | mvcc: restore compact to 839
2022-01-10 06:17:12.003792 I | etcdserver/membership: added member a874c87fd42044f [https://127.0.0.1:2380] to cluster c9be114fc2da2776
Below are the list of arguments you need to pass with the above restore command :-
--cacert: CA certificate path
--cert: Server certificate path
--key: Server Key
--endpoint: Endpoint URL
--data-dir: Backup data directory path.
--initial-cluster: initial cluster configuration for bootstrapping
--initial-advertise-peer-urls: List of this member’s peer URLs to advertise to the rest of the cluster.
Step 5: Modify etcd.yaml
But wait you are not done yet. There is one more step you need to perform. You need to specify the volume mountpath and host path as well in etcd.yaml
configuration file. Usually this file resides under /etc/kubernetes/manifests
directory. You can use vi editor to open the file and do the necessary changes as shown below. Once you save and exit the file, you will see that all pods, deployments and services will start getting created.
root@cyberithub:~# vi /etc/kubernetes/manifests/etcd.yaml ...................................................... containers: - command: - etcd - --advertise-client-urls=https://10.13.53.9:2379 - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --client-cert-auth=true - --data-dir=/var/lib/etcd-from-backup - --initial-advertise-peer-urls=https://10.13.53.9:2380 - --initial-cluster=master=https://10.13.53.9:2380 - --key-file=/etc/kubernetes/pki/etcd/server.key - --listen-client-urls=https://127.0.0.1:2379,https://10.13.53.9:2379 - --listen-metrics-urls=http://127.0.0.1:2381 - --listen-peer-urls=https://10.13.53.9:2380 - --name=master - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt - --peer-client-cert-auth=true - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --snapshot-count=10000 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt .................................................. initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 15 volumeMounts: - mountPath: /var/lib/etcdbkp name: etcd-data - mountPath: /etc/kubernetes/pki/etcd name: etcd-certs hostNetwork: true priorityClassName: system-node-critical volumes: - hostPath: path: /etc/kubernetes/pki/etcd type: DirectoryOrCreate name: etcd-certs - hostPath: path: /var/lib/etcdbkp type: DirectoryOrCreate name: etcd-data status: {}
Step 6: Check ETCD Container
After waiting for few minutes, you can check if the ETCD container came up using docker ps -a | grep etcd command. If you see output like below then it came up successfully.
root@cyberithub:~# docker ps -a | grep etcd
b16bee0d8755 0369cf4303ff "etcd --advertise-cl…" 4 minutes ago Up 4 minutes k8s_etcd_etcd-master_kube-system_31f1a55a7c052d741276c3bdaab988b1_0
aed3a6baf7d1 k8s.gcr.io/pause:3.2 "/pause" 4 minutes ago Up 4 minutes k8s_POD_etcd-master_kube-system_31f1a55a7c052d741276c3bdaab988b1_0
Step 7: Check Member List
Next we need to verify the total member list by using below etcdctl member list
command.
root@cyberithub:~# ETCDCTL_API=3 etcdctl member list --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379
a874c87fd42044f, started, master, https://127.0.0.1:2380, https://10.13.53.9:2379
Step 8: Check Pods, Services and Deployments
By this time, you should have all the pods, services and deployments restored. So now you can quickly verify that by running kubectl get pods,svc,deployments
command as shown below. If you see everything in running state then it means all the applications are restored successfully.
root@cyberithub:~# kubectl get pods,svc,deployments
NAME READY STATUS RESTARTS AGE
pod/test-746c87566d-qk447 1/1 Running 0 40m
pod/test-746c87566d-r9m98 1/1 Running 0 40m
pod/test-746c87566d-w7wbk 1/1 Running 0 40m
pod/hello-75f847bf79-9d6ht 1/1 Running 0 40m
pod/hello-75f847bf79-fttp9 1/1 Running 0 40m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/test-service NodePort 10.108.78.129 <none> 80:30082/TCP 40m
service/kubernetes ClusterIP 10.88.0.1 <none> 443/TCP 45m
service/hello-service NodePort 10.153.81.167 <none> 80:30080/TCP 40m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/test 3/3 3 3 40m
deployment.apps/hello 2/2 2 2 40m