How to Backup and Restore Kubernetes ETCD database [Step by Step]

Table of Contents

How to Backup and Restore Kubernetes ETCD database

Also Read: How to take a Kubernetes Cluster node out for Maintenance

Step 1: Check ETCD version

Before starting the backup always verify the ETCD version by using kubectl -n kube-system describe po etcd-master | grep Image command. Here etc-master is our etcd pod. This might be different for you. So change this name to the appropriate one.

root@cyberithub:~# kubectl -n kube-system describe po etcd-master | grep Image
Image: k8s.gcr.io/etcd:3.4.13-0
Image ID: docker-pullable://k8s.gcr.io/etcd@sha256:4ad90a11b55313b182afc186b9876c8e891531b8db4c9bf1541953021618d0e2

Step 2: Take Backup

Once etcd version is noted, it is time to save the snapshot at a particular location by using below etcdctl snapshot save command. You can also notice we are passing arguments like ca cert path, server cert path, key path and endpoints along with the snapshot save command. Without specifying all these details, snapshot cannot be taken. So please provide all the information correctly.

root@cyberithub:~# ETCDCTL_API=3 etcdctl snapshot save --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 /opt/snapshot-data.db
Snapshot saved at /opt/snapshot-data.db

Below are the list of arguments you need to pass with the above backup command :-

--cacert: CA certificate path

--cert: Server certificate path

--key: Server Key

--endpoint: Endpoint URL

Step 3: Check the Application

Now let's say some disaster occurred and all your application is gone. If you try to check any pods, deployments or services using kubectl get pods,deployments,svc command then all you see is the below output.

root@cyberithub:~# kubectl get pods,deployments,svc
NAME               TYPE      CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.88.0.1  <none>      443/TCP 85s

Step 4: Restore Backup

To restore the Cluster data, we need to use etcdctl snapshot restore command to restore data from the backup we have taken earlier. You can also notice the arguments we are passing through etcdctl snapshot restore command that needs to be given correctly.

root@cyberithub:~# ETCDCTL_API=3 etcdctl snapshot restore --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379 --data-dir="/var/lib/etcdbkp" --initial-cluster="master=https://127.0.0.1:2380" --name="master" --initial-advertise-peer-urls="https://127.0.0.1:2380" /opt/snapshot-data.db
2022-01-10 06:17:11.995906 I | mvcc: restore compact to 839
2022-01-10 06:17:12.003792 I | etcdserver/membership: added member a874c87fd42044f [https://127.0.0.1:2380] to cluster c9be114fc2da2776

Below are the list of arguments you need to pass with the above restore command :-

--cacert: CA certificate path

--cert: Server certificate path

--key: Server Key

--endpoint: Endpoint URL

--data-dir: Backup data directory path.

--initial-cluster: initial cluster configuration for bootstrapping

--initial-advertise-peer-urls: List of this member’s peer URLs to advertise to the rest of the cluster.

Step 5: Modify etcd.yaml

But wait you are not done yet. There is one more step you need to perform. You need to specify the volume mountpath and host path as well in etcd.yaml configuration file. Usually this file resides under /etc/kubernetes/manifests directory. You can use vi editor to open the file and do the necessary changes as shown below. Once you save and exit the file, you will see that all pods, deployments and services will start getting created.

root@cyberithub:~# vi /etc/kubernetes/manifests/etcd.yaml
......................................................
containers:
- command:
- etcd
- --advertise-client-urls=https://10.13.53.9:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd-from-backup
- --initial-advertise-peer-urls=https://10.13.53.9:2380
- --initial-cluster=master=https://10.13.53.9:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --listen-client-urls=https://127.0.0.1:2379,https://10.13.53.9:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- --listen-peer-urls=https://10.13.53.9:2380
- --name=master
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
- --snapshot-count=10000
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
..................................................
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /var/lib/etcdbkp
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcdbkp
type: DirectoryOrCreate
name: etcd-data
status: {}

Step 6: Check ETCD Container

After waiting for few minutes, you can check if the ETCD container came up using docker ps -a | grep etcd command. If you see output like below then it came up successfully.

root@cyberithub:~# docker ps -a | grep etcd
b16bee0d8755 0369cf4303ff "etcd --advertise-cl…" 4 minutes ago Up 4 minutes k8s_etcd_etcd-master_kube-system_31f1a55a7c052d741276c3bdaab988b1_0
aed3a6baf7d1 k8s.gcr.io/pause:3.2 "/pause" 4 minutes ago Up 4 minutes k8s_POD_etcd-master_kube-system_31f1a55a7c052d741276c3bdaab988b1_0

Step 7: Check Member List

Next we need to verify the total member list by using below etcdctl member list command.

root@cyberithub:~# ETCDCTL_API=3 etcdctl member list --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=127.0.0.1:2379
a874c87fd42044f, started, master, https://127.0.0.1:2380, https://10.13.53.9:2379

Step 8: Check Pods, Services and Deployments

By this time, you should have all the pods, services and deployments restored. So now you can quickly verify that by running kubectl get pods,svc,deployments command as shown below. If you see everything in running state then it means all the applications are restored successfully.

root@cyberithub:~# kubectl get pods,svc,deployments
NAME                      READY STATUS  RESTARTS AGE
pod/test-746c87566d-qk447  1/1  Running    0     40m
pod/test-746c87566d-r9m98  1/1  Running    0     40m
pod/test-746c87566d-w7wbk  1/1  Running    0     40m
pod/hello-75f847bf79-9d6ht 1/1  Running    0     40m
pod/hello-75f847bf79-fttp9 1/1  Running    0     40m

NAME                  TYPE      CLUSTER-IP     EXTERNAL-IP PORT(S)      AGE
service/test-service  NodePort  10.108.78.129  <none>      80:30082/TCP 40m
service/kubernetes    ClusterIP 10.88.0.1      <none>      443/TCP      45m
service/hello-service NodePort  10.153.81.167  <none>      80:30080/TCP 40m

NAME                  READY UP-TO-DATE AVAILABLE AGE
deployment.apps/test  3/3       3         3      40m
deployment.apps/hello 2/2       2         2      40m

Cyberithub