Using the new Rancher Backup Operator to protect your cluster

Introduction

It goes without saying that backing up Kubernetes Components is an essential part of any Kubernetes Administrator, especially in production enviroments. While it has always been possible to write a home-grown solution, we here at Rancher want to make it as easy as possible to protect your Rancher-based clusters. Not only does backing up Rancher components protect you in the event of an failed upgrade or other disasters, but it's also a great way to migrate Rancher to a completely different cluster.

Cluster Backup and Restore in Rancher v2.5 and Above

First, a quick history lesson. In versions of Rancher prior v2.5, although etcd snapshots and backups were well integrated into the UI, automated snapshots were cluster dependent. Kubernetes clusters stood up by RKE (Rancher Kubernetes Engine) and K3s (with an external database) meshed well, although things became tricky when moving to the cloud-managed Kubernetes space such as Amazon EKS.

Most managed cloud providers of Kubernetes instances do not expose Etcd to the user enough where creating a snapshot would be possible, even with an external tool. Clearly, another solution was needed.

Enter the Rancher Backup Operator.

The new Rancher Backup Operator allows Rancher to be backed up, and even restored, on ANY local Kubernetes cluster. Whether Rancher is running on a kubeadm cluster, an EKS or AKS cluster in the cloud or even K3s, Rancher can be backed up and restored with Rancher v2.5.

Keep in mind that the Rancher Backup Operator is intended for backing up the Rancher app hosted in the local cluster. This is extremely useful for things like major version upgrades, cluster migration/expansion and downgrading Rancher to earlier states. If you are looking to back up entire Kubernetes workloads and persistent volumes, something like Valero should be used in addition to the Rancher Backup Operator for complete protection. Stay tuned for further posts about how to backup live Kubernetes workloads. For now, this post will focus on the Rancher side of the house.

How Often Do I need to Back Up/Snapshot Etcd?

The answer, like many things in life, is that it depends. A Kubernetes or Rancher administrator should first look at the overall activity of the cluster. Then, you may determine how much data or cluster state loss is acceptable for their environment. In an extremely active production environment, you may want to take a backup of the Rancher components every couple of hours. For something that’s less important, perhaps a once-a-day snapshot will be effective.

How Does Rancher Backup Operator Work?

Since we can’t always connect to Etcd directly (particularly in cloud environments), the Rancher Backup Operator introduces a few custom resources that speak directly to the kube-apiserver instead. The kube-apiserver will be able to reach all the requested resources regardless of what the backend datastore is. A resource sets CRD defines what Kubernetes Resources need to be backed up, the backup CRD defines where backups are to be stored, and a restores crd runs the process in reverse to restore a backup.

Consisting of a helm chart, the Rancher Backup Operator can be installed via the Rancher UI, or via the helm cli directly. To illustrate how simple this is in the Rancher UI, perform the following.

Installation of the Operator

On the Global Clusters screen, making sure that you are logged in as an administrator user, click on the cluster explorer option for the local cluster:

Cluster Explorer Button

Once in the cluster explorer, on the top left of the dashboard, navigate to and click the “Apps” option:

Apps Button

At the main charts screen, click on the “Rancher Backups” chart:

If you would like to configure your default storage location, you can do this by selecting the “Chart Options” before installing:

Chart options

Note, that if you choose the “No default storage location” option, all future backups must use an S3-compatible object store. Modifying the Rancher Backup tool to use an existing storage class or persistent volume down the road will only be possible by redeploying the application.

Hence, it is important to think about where you will store your backups at this point. For this example, I’m going to choose “No default storage location”, and then perform a one-off S3-compatible backup later.

If you click further down onto the “Helm README,” you’ll see a view with commands on how to manually install or upgrade the helm chart via the CLI, along with all the default values the chart has set. This view is completely optional, but it’s a great way to get some useful information about the helm chart that is being installed.

Helm Options

After setting the options in the “Chart Options”, simply hit the big blue “Install” button.

The Rancher API will then proceed to install the helm charts associated with the Rancher Backup Operator, and in a few moments, you should see a new “Rancher Backups” option in the dropdown menu on the top left:

Rancher Backups Button

Clicking on that dropdown will bring you to the main page of the Rancher Backups Application:

There isn’t a lot to see yet, but that’s because we haven’t created any backups!

Before we create the backup, let’s go ahead and walk through how we would go about adding some credentials to Rancher.

Adding Backup Credentials

Let’s say you wish to create a Rancher Backup to an S3-compatible storage API, such as Minio, or an actual Amazon S3 bucket. First, we need to add the credentials of our storage provider to the local cluster. You can see some examples of this at the following link, or simply follow along with this tutorial.

https://rancher.com/docs/rancher/v2.x/en/backups/v2.5/examples/#backup

Create a yaml secret that looks like the following, replacing the appropriate values in the "data" field:

apiVersion: v1
kind: Secret
metadata:
  name: creds
type: Opaque
data:
  accessKey: <Enter your access key>
  secretKey: <Enter your secret key>

Keep in mind, if you plan on applying a Kubernetes secret directly from a yaml, your data needs to be base64 encoded. To do that, you can run the following command to get the base64 values:

$ echo -n "myawsaccesskey" | base64

That should give you the base64 encoded version of your string, which you can then put inside the data field of the yaml.

Now, create the secret:

$ kubectl apply -f backup_creds.yaml

Great! At this point you can either create a Backup resource manually by following the yaml format in the example documentation above, or create the same resource via the Rancher Cluster Explorer UI. I'm going to demonstrate creating a backup through the cluster explorer.

Creating a Backup

Now, let’s go all the way back to the cluster explorer and back to the Rancher Backups application. Go ahead and click on the big create button.

create button

We will go ahead and create a one-time backup, to S3, using our credentials that we input earlier. Fill out the appropriate fields. You can also optionally add encryption via an Encryption Config Secret, which is highly recommended.

Backup Settings

Once you have populated all the required fields and selected your credential secret you created earlier, you can go ahead and create your backup.

After just a few moments, our backup to the S3-compatible store will be complete.

Backup Complete

If you view the bucket where you saved the backup, you will see that Rancher has created a timestamped .tar.gz with our cluster data. This archive contains all the relevant JSON information about our Rancher system objects, which you can view directly if the backup has not been encrypted. Neat!

Backup Tar Contents

Restoring a Backup

Oh no! Disaster has struck! Never fear, the Rancher Backup Operator makes it dead-simple to restore a previous Rancher state, or even migrate that state to a completely different Rancher cluster.

Much like creating a backup, navigate to the cluster explorer, then Rancher Backups & Restores. Here we can create a Restore job based on previous backups. Go ahead and click the "Create" button.

Create Restore

Here, you can see how Rancher autopopulates the fields with the one-time backup we created in the previous step. Of course, you could always pull from a different S3 source if needed, but we will just use the image we created earlier.

Restore Options

Now just hit the "Create" button again:

You may notice that if you attempt to refresh the page at this point, your Rancher server will become temporarily unavailable. Don't worry, this is expected behavior. Behind the scenes Rancher is busy recreating all of the objects stored in the previous backup, and pruning out the other containers.

In just a matter of minutes, your Rancher cluster should be back up and operating. If we go back to the Rancher Backups screen under restore, we can see that our restore of objects completed successfully.

Restore Complete

Conclusion

The Rancher Backup Operator is an essential part of any Kubernetes administators toolkit, turning what could be a complex backup process into a effortless endevaour.

Running the backup operator in your Rancher environment can only help in the long run. Although the best backups are the ones we never have to use, luck certainly favors the prepared. Try out Rancher Backups today, and your future self will thank you.

Happy Ranching!