Saturday, December 12, 2009

Backups and Files Deduplication

Yesterday, I've heard about a "No Space Left" on Backup Machine, obviously the first solutions arrived was like "Buy a new disk, they don't cost so much" or "Try this ultra branded backup product"... and then there's my custom and easy solution that was rejected. :)

So, If you're a little bit brave and you're able to write a couple of lines in Python you can have a very flexible backup system with files deduplication.

The main idea is...

  • Foreach home directory that you've to backup, store a key/value list with file_path/shasum (or an hash function that you trust).

  • Foreach file in the home directory check if the file is already on the backup server (Store files on Backup Server with shasum as name). If the file was not present upload it.


In this way on the Backup server there're "two folders" one that contains all the backed-up files (storing by shasum, means that one file was on server just once), and one that contains home-th30z-10-Nov-2009.list, home-th30z-11-Nov-2009.list, and so on.

In this way you can say... Hei, fully restore my home at "specified date" or just peek a file that you've in another specified date.. and maybe all the features that you want.

This is a easy and flexible way to store your backup, without wasting space and have a great granularity level of restore...

If you've really groked the idea, you can start your start-up company focused on backups. :D

No comments:

Post a Comment