Backup your stuff

· by Gürkan · Read in about 5 min · (1028 Words)


We were using a handmade Python backup application at work. While it works, it was not practical and breaking from time to time. In that case we had to stare at the code for half an hour and ask to the single person in the company who understands its "way of doing things" eventually.

So I started to look for a "better backup tool". Stumbled upon a lot of alternatives, but decided to use Restic for multiple good reasons:

  • Single binary
    It's written with go, so it can be compiled into a single binary for each platform we need to support. No more dependency/packaging hell.
  • Supports multiple backends
    Plenty of options on where to keep your backups
  • Has deduplication
    And actually good at it. If your new backup just have 5Mb worth of "different" blocks, you generally don't see much more transferred over the wire also.
  • Has encryption
    All you see is garbage on the backup storage, unless you have the key.
  • Fast
    Yeah, you even have to use tools like "ionice" to make it slower where you need.
  • Restore-friendly
    This is important. I want to be able to verify my backups, search for something in my backups and have the freedom of restoring a single file. It can even mount the backups read-only as a FUSE filesystem.
  • Parallel-friendly
    Most of the actions are non-locking, so you can run parallel backup/restore operations on a common storage without much hassle.

Initial tests proved that Restic was good enough for our usage. But with the amount of data we need to store, we also needed a nice storage. First I've tried REST Server, which is a special tool written for Restic. But scalability questions and realizing the repository being stale for quite long time gave the idea: That was just a proof-of-concept for helping small deployments.

My second try was Minio, which hit the bullseye. A neat object storage with some experience/opininons behind it:

  • Single binary
    Yep, also written with go.
  • S3 compatible
    It's nice to have the option that clients will not need much optimization if you want to change the storage, since there are some pretty usable S3 implementations around, including S3 itself.
    This also means you can have the special IAM roles you can place. Read-only or append-only buckets.. Nice.
  • Not so scalable
    But since this is a design choice (and you have ways to scale vertically if don't need absolutely single bucket), that's OK as long as it increases stability.
  • Stable & fast
    Resource usage is also lightweight, even for operations like mirroring.

Initially I've set-up singular nodes, but turns out clustering might be a better choice for next steps, so now I'm investigating it for possibilities.

Performance and speed was very good overall. Initial backups took some time, considering it needed to upload a lot of new blocks to the repository, but we've successfully backed up few TBs of data with varying sizes of repositories.

Here is a small summary how I chose to set up the restic-storage backup environment. Keep in mind I've puppetized the whole setup and it's only as complicated as our requirements and expectations, but I think this could be a valid setup for most people.

Minio side

  • Every project/group will have its own bucket
    You need to decide how granular you'd like to go, since this will affect Restic. 1 server having its own bucket will only de-duplicate files of that server. Combining more into same bucket will gain more space, therefore speed. But you might want to consider security here, since putting more in single bucket means more readability for your backups from compromised hosts.

  • Each project/group will use their own user, which will only have append only rights for their bucket
    So a compromised server/group won't be able to remove backups. You can use policy generator to create fine-tuned roles. What you need for each bucket is:

    • s3:GetObject and s3:PutObject for arn:aws:s3:::bucket_name/*
    • s3:DeleteObject for arn:aws:s3:::bucket_name/locks/*
    • s3:CreateBucket (optional), s3:GetBucketLocation and s3:ListBucket for arn:aws:s3:::bucket_name

  • There will be one "backup master" role per minio instance, which has global read/write rights
    This user will be used by a privileged restic instance to clean old backups. You can supply these master credentials via environment variables to the service or you can get it from logs once you run it.

Restic side

Server

Restic is a client-only tool, but you need to use one instance as "manager", which can do two things for you:

  • Initially create the encrypted Restic repositories you'll use
    If you don't want to grant your "normal" users this right.
  • To prune unused blocks in the backup
    "forget" is the term for marking unused blocks, which is relatively fast. "prune" is to actually remove them. This might take quite a bit of time since it can re-arrange and reindex the blocks within the backup storage. Keep in mind that these operations requires an exclusive lock for Restic's repository, so there needs to be a window scheduled for pruning or you should catch/retry if there is a lock, since your normal backup operations will not work while you're doing these operations.

So this user should utilize the mentioned "backup master" or a similar "write access" role on Minio side.
It also supports a lot of retention options, I don't think you'll need more.

Client

Just setting following environment variables will be enough for restic:

  • RESTIC_REPOSITORY: The address of minio instance, like s3:http://remote_host:port/bucket_name
  • RESTIC_PASSWORD_FILE: Local path for providing password text, a bit more secure, instead of putting it to ENV.
  • AWS_ACCESS_KEY_ID: Minio user ID
  • AWS_SECRET_ACCESS_KEY: Minio secret access key (sadly, not possible to supply via password file yet)

Some keynotes

  • Use cache while you're doing restic operations
    Running restic forget command with some common project cache directory makes it way faster
  • Test your actions and check verbose output
    --dry-run will clearly show you what is going to happen, I had to run forget few times with it to see which snapshot will be removed for what reason
  • forget is cheap, prune is not
  • Try using unlock command before your automated operations
    If there is any active lock is present, even though you remove it, it will be re-placed by the active process anyway