Rotating Kops Etcd Certificates

Check etcd-manager version

Using kubectl:

$ k -n kube-system get pod etcd-manager-main-ip-NODE-IP-ADDRESS -o yaml | grep "image\:"
    image: kopeio/etcd-manager:3.0.20200429

According to the releases documentation version 3.0.20200428 brings a fix that renews expiring certificates in the cluster.

However, the implementation of this as noted in github issue #309

Not a perfect fix, if you don’t restart things every now and then, they could still expire. But it’s at least closer and means if you do restart things, it will fix itself.

Rotating expired etcd client certificates

After the etcd-manager running in the cluster is confirmed to be 3.0.20200428 or above, all that’s needed to regenerate the certificates and restore the etcd cluster is restarting the Kubernetes master nodes. You could also restart the etcd main and etcd events pods instead, however we’ll go with restarting the whole EC2 instances.

When the restarted instance comes back up it will start the etcd-main and etcd-events pods , which will trigger the startup checks implemented in the etcd-manager code to check the certificates expiration.

If it finds that the certificates are expiring in 60 days or less, it will regenerate them. You should see the following output in the logs for the etcd-main and etcd-events pods:

etcd-manager
...
I1208 08:37:56.019184   12806 main.go:299] Setting data dir to /rootfs/mnt/master-vol-ID
I1208 08:37:56.019648   12806 certs.go:106] existing certificate not valid after 2022-12-08T08:37:05Z; will regenerate
I1208 08:37:56.019655   12806 certs.go:167] generating certificate for "etcd-manager-server-etcd-c"
I1208 08:37:56.023307   12806 certs.go:106] existing certificate not valid after 2022-12-08T08:37:05Z; will regenerate
I1208 08:37:56.023350   12806 certs.go:167] generating certificate for "etcd-manager-client-etcd-c"
...
I1208 08:37:56.032081   12806 pki.go:39] generating peer keypair for etcd: {CommonName:etcd-c Organization:[] AltNames:{DNSNames:[etcd-c.internal.clstrname.k8s.local] IPs:[127.0.0.1]} Usages:[2 1]}
I1208 08:37:56.032340   12806 certs.go:106] existing certificate not valid after 2022-12-08T08:37:05Z; will regenerate
I1208 08:37:56.032346   12806 certs.go:167] generating certificate for "etcd-c"
I1208 08:37:56.041521   12806 pki.go:79] building client-serving certificate: {CommonName:etcd-c Organization:[] AltNames:{DNSNames:[etcd-c.internal.clstrname.k8s.local etcd-c.internal.clstrname.k8s.local] IPs:[127.0.0.1 127.0.0.1]} Usages:[1 2]}
I1208 08:37:56.041786   12806 certs.go:106] existing certificate not valid after 2022-12-08T08:37:05Z; will regenerate
I1208 08:37:56.041795   12806 certs.go:167] generating certificate for "etcd-c"
I1208 08:37:56.499683   12806 certs.go:167] generating certificate for "etcd-manager-etcd-c"
...
I1208 08:37:56.753268   12806 certs.go:167] generating certificate for "etcd-c"

This needs to be done on all Kubernetes master nodes/etcd pods.

Fix not working Prometheus monitoring after cert rotation

Prometheus Operator relies on a predefined secret containing etcd client certificates, key and etcd CA cert. It’s pre-configured here. However this secret is not updated automatically with the newly generated certs by the etcd-manager and needs to be done manually.

The easiest way for this is to ssh into one of the Kubernetes master nodes and run the following command, but first you need to delete the old secret containing the old etcd client certs:

$ kubectl -n monitoring delete secret etcd-certs

Next, recreate the secret but this time including the newly generated certificates:

$ kubectl -n monitoring create secret generic etcd-certs --from-file=ca.crt=/etc/kubernetes/pki/kube-apiserver/etcd-ca.crt --from-file=client.crt=/etc/kubernetes/pki/kube-apiserver/etcd-client.crt --from-file=client.key=/etc/kubernetes/pki/kube-apiserver/etcd-client.key

* replace namespace and secret name accordingly

The only thing left is to restart the Prometheus pod so that it mounts the newly created secret containing your renewed etcd client certificates.