Acunote is online project management and Scrum software. Acunote is fast and easy to use. It shows actual progress, not just wishful thinking. Click here to learn more.
« Back to posts

Tarsnap key rotation with tarsnap-recrypt and tarsnap-keyregen

Tarsnap is a fabulous Unix backup software by Colin Percival. If you've used it for a while, you may have had to deal with running tarsnap-recrypt to re-encrypt data using a different key. Maybe to deal with a security bug, maybe for other reasons.

Turns out that doing this in a production environment is kinda involved. I had to bug Colin with a lot of questions to work out the details. So, I figured the least I could do is write down the answers in one place for others to find.

You too can rotate keys with tarsnap-recrypt

So, read on to learn how to use tarsnap-recrypt and tarsnap-keyregen in a real-world production environment. Do note that much of the copy here is taken directly from Colin's emails, and is used here with his permission.

Executive summary

On a production server running tarsnap-recrypt is complicated. You have more data that can be recrypted between regular backup cron jobs, and you have limited tarsnap keys which don't have bits required for recrypt. You can solve this by running (and re-running) tarsnap-recrypt on different server with full keys. This can be done in parallel with regular backup jobs and you can set up things so the recrypt will not interfere with them until it's done.

What we actually did

# everyting is done on a trusted workstation, not on the server

# copy server cache directory to workstation
rsync -avz -e ssh server.example.com:/var/tarsnap/cache \
      ~/tarsnap/cache/server.old/

# create new key
tarsnap-keyregen --keyfile ~/tarsnap/keys/server.new.key \
                 --oldkey ~/tarsnap/keys/server.old.key \
                 --user me@example.com --machine server

# run recrypt.  This can take a long time (days), and final delete
# transaction will fail because of server cron jobs that will happen
# in the meantime
tarsnap-recrypt --oldkey ~/tarsnap/keys/server.old.key \
                --oldcachedir ~/tarsnap/cache/server.old \
                --newkey ~/tarsnap/keys/server.new.key \
                --newcachedir ~/tarsnap/cache/server.new

# copy server cache directory locally again.  It got changed by the
# cron job
rsync -avz -e ssh server.example.com:/var/tarsnap/cache \
      ~/tarsnap/cache/server.old/

# run recrypt again.  This one will run quickly, final delete
# transaction will succeed and delete old data.
tarsnap-recrypt --oldkey ~/tarsnap/keys/server.old.key \
                --oldcachedir ~/tarsnap/cache/server.old \
                --newkey ~/tarsnap/keys/server.new.key \
                --newcachedir ~/tarsnap/cache/server.new

# sync new cache dir back to server
rsync -avz -e ssh ~/tarsnap/cache/server.new/ \
      server.example.com:/var/tarsnap/cache/

# create a restricted version of the new key
tarsnap-keymgmt --outkeyfile \
                ~/tarsnap/keys/server.new.restricted.key \
                -w ~/tarsnap/keys/server.new.key

# upload new restricted key to the server
scp ~/tarsnap/keys/server.new.restricted.key \
    server.example.com:/etc/tarsnap/server.restricted.key

Basics

tarsnap-recrypt downloads and decrypts data using old-key-file and re-encrypts and uploads it using new-key-file. After all the data has been re-uploaded, tarsnap-recrypt deletes the data using old-key-file so that the only remaining copy of the data is encrypted using new-key-file. The key file new-key-file must have been generated by tarsnap-keyregen(1) with old-key-file.

Why does tarsnap-recrypt need special keys from tarsnap-keyregen?

Tarsnap has some keys which need to stay the same when re-encrypting data; for example, there is a key used for mapping archive names to the 256-bit names which identify metadata blocks. If this key is changed, Tarsnap won't be able to read archives since it won't be able to find the right metadata blocks. (The other two keys which need to remain constant when re-encrypting relate to how Tarsnap splits file data into chunks -- if these keys change, Tarsnap will still be able to read archives, but when creating new archives it won't produce the same series of chunks, thus resulting in duplicated data.)

The difference between tarsnap-keygen and tarsnap-keyregen is essentially just that tarsnap-keyregen keeps the keys which need to remain constant.

Why is using tarsnap-recrypt faster than the alternative?

An alternative to tarsnap-recrypt would be register a new key with keygen and to run tarsnap -r --keyfile old-key | tarsnap -c @- --keyfile new-key for for each archive in the old archive space.

The difference is that tarsnap-recrypt avoids downloading duplicated blocks multiple times. This is because tarsnap does deduplication only during backup, not during restore.

This is actually a huge deal. For example given the following archive space:

tarsnap --print-stats --humanize-numbers
                                       Total size  Compressed size
All archives                                 3 TB           550 GB
  (unique data)                             81 GB            20 GB

tarsnap-recrypt will download 20GB while straight tarsnap will download 550GB. In both cases 20GB will be uploaded.

What permission bits does tarsnap-recrypt require?

Read/write/delete for the old key, read/write for the new key.

What are the concurrency characteristics of tarsnap-recrypt?

tarsnap-recrypt will lock the old and new cache directories while it runs.

If there is a backup in progress when you launch tarsnap-recrypt, it will fail immediately with a "cannot lock old cache directory" message. If you try to start another backup after you've started tarsnap-recrypt, the new tarsnap process will exit with the same error.

Is locking of old "archive space" done server side? Meaning if I run recrypt on one machine (with delete bits), and a cron job starts on another machine, will the cron job fail?

It's done on both sides, actually -- it locks the local cache directory to make sure that another tarsnap process can't start while it's running, and it tells the server "I"m starting a transaction, please cancel any transaction which is currently in progress".

This could actually be used to your advantage, albeit by working around the safety measures built into tarsnap: If you

  1. Copy the cache directory for the old keys to a new location (even to a different system);
  2. Run tarsnap-recrypt, with delete keys, with the new cache directory; and
  3. Run cron jobs as normal on the original cache directory,

then the tarsnap cron jobs will cancel the delete transaction tarsnap-recrypt is running on the old machine -- ultimately making tarsnap-recrypt fail, but only after it finishes copying all the data across.

Then once the first pass is done, you can

  1. Stop your cron jobs;
  2. Copy the cache directory again (so that tarsnap-recrypt can "see" the current state, including the most recently uploaded blocks);
  3. Run tarsnap-recrypt again (which will copy the new blocks and then delete everything from the old archive space); and
  4. Re-enable the cron job but using the new cache directory.

What do do if I can't can't find a window for recrypt to run without interfering with regular tarsnap backup jobs?

As long as the existing jobs are only creating new archives, you should be able to do this by throwing a "killall tarsnap-recrypt" before each time you create a new archive. It will result in some work being wasted, but tarsnap-recrypt checkpoints its progress at least once per GB of data, so unless you have very frequent backups you'll get through eventually.

Does the target archive space need to be empty?

Yes, tarsnap-recrypt will fail if there's data in the new archive space which isn't in the old archive space.

Will recrypt be able to continue from checkpoint if more archives are created in the source space after recrypt is interrupted?

Yes.

The server where this tarsnap runs normally contains read/write keys with no delete bit for security reasons. Can recrypt work with such limited keys?

No.

tarsnap-recrypt will try to delete everything from the old archive space after it finishes copying; if it can't do that it will exit with an error (but by that point the important work is done).

How fast does tarsnap-recrypt work?

You should be able to get at least 10-20 Mbps if your internet connection is fast enough. It uses 8 TCP connections to read blocks and another 8 to write blocks back. [I got 8 Mbps (1 GB per hour) on a cable Internet connection]

Is there a way to monitor the progress of recrypt?

Yes, it displays its progress.

Read comments on this post on Hacker News