Incremental backups on unix systems are annoying. For a start, you have to do a “full snapshot” fairly regularly, otherwise it takes rather a lot of time/effort/disk-usage to restore to, say, yesterday (which is the most likely place you want to restore to). You can’t delete older backups willy nilly. You also can’t compress older backups if you need to compare them to now (e.g. rsync’s hard links feature). Grrr.

So I got to thinking, and came up with a theory: having rsynced data to a new backup folder using a previous backup folder as –link-dest, you could then go into that previous folder, delete any files which had a link reference count of >=2 (i.e. they also existed in your new folder), keep a list of those deleted files for future reference, and bob’s your uncle. You can then delete/compress/whatever that old folder which, being only what’s different from now, would be pretty small to start with. More to the point, to get back to a given point in time you *start* from today and work backwards, thus making the most likely option of “restore to yesterday” really easy.

I decided to call this general process Reverse Incremental Backups. So I googled it to see if anyone else had come up with the concept. And they had…

rdiff-backup has basically exactly the same functionality of my theory, but ready made and wrapped up into a neat package. It’s brilliant. It uses the rsync library, so you get pretty much all the functions and options of rsync, but it keeps a special “rdiff-backup-data” folder in your backup dir, which contains compressed reverse diffs and other bits & pieces. You can restore with one simple command to any point in time that you’ve got the reverse diffs for. You can also run with the –remove-older-than command regularly so you can keep only, say, the last month of data.

This is going to reduce the amount of space that backups use by rather a lot, and also increase the frequency with which I can take backups (e.g. several times a day for critical servers). Hooray for !