Feedback Form
Home Open Source Tips and Tutorials Backup to Amazon S3 using Duplicity
Friday, 26 June 2009 14:28
After my VPS provider had been hacked, and I got a new VPS from them, I also got a new problem.

Since my new VPS has not HyperVM control panel (why?), I stay without an easy managed backup solution. / HyperVM has a built-in backup feature. /

So I had to find a new solution for my backup process/needs.

Open Source Backup Applications

I wanted an app that I can run full and incremental, automated, remote backups with it, and its package is existing in the Debian package repository.

I looked into/around several open source backup applications:

Bacula

Zmanda

rsync

rdiff-backup

duplicity

Backupninja

I found duplicity the best solution for my needs. (Someday I will test Backupninja.) Duplicity supports lots of protocols, space and bandwith efficient, secure by using encryption and signing.

Storage for my for my backups

I don't want to order/set up a new VPS for my backups only. So I started to seek a cost effective remote disk space. After a short research I found Amazon S3 is the best for me. Cheap, no minimum fee, I pay only what I use and duplicity supports Amazon S3 protocol.

So I opted for duplicity and Amazon S3.

What is duplicity?

Duplicity backs [up] directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

What is Amazon S3?

Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

The Backup Guide

If you also want to backup your data to Amazon S3 with duplicity stay with me. I try to demonstrate how to create a simple wrapper script for duplicity that allows you to automatically create GPG encrypted incremental backups that are saved to an Amazon S3 bucket.

Sign up for Amazon S3

Sign up for an Amazon Web Services account, then add the Simple Storage Service (S3). This will involve giving them your credit card number, to be charged monthly.

Make sure to write down your Secret Access Key, along with your Access ID Key. You'll need those for any application that interacts with S3.

Graphical Amazon S3 interfaces

If you want to manage your S3 account/buckets trough a graphical interface download JetS3t ( free/open source) or S3 Browser (freeware/free for non-commercial usage).

Package installation

You need to install duplicity and python-boto ( allow Python to talk to S3 ). You also need GnuGP and librsync but they should both be automatically installed as dependencies of duplicity. / I use Debian 5 on my VPS /

apt-get install duplicity python-boto

Encryption and signing keys

You should encrypt your files so that they are safe from prying eyes in transit and in storage. Signing them protects the files from alteration in storage or transit.
You can use separate keys for encryption and signing, but I haven't in this case.

Generate a new GPG key

/ If you already have a GPG key that you want to use then skip this step. /

You need sufficient permissions on the files and directories what you want to backup, so I strongly suggest run your backup jobs as root. So open a terminal and become root.

Now run "gpg --gen-key" to generate your key and follow the prompts:

# gpg --gen-key
gpg (GnuPG) 1.4.9; Copyright (C) 2008 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
(1) DSA and Elgamal (default)
(2) DSA (sign only)
(5) RSA (sign only)
Your selection?
Accept the default (Enter) or press 1 for DSA and Elgamal.
DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)
The default (2048) is fine. Just hit Enter.
Requested keysize is 2048 bits
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0)
If you don't want your key to expire like me, just hit Enter again to accept the default. Otherwise select whatever you want.
Key does not expire at all
Is this correct? (y/N)
Hit y and then Enter.
You need a user ID to identify your key; the software constructs the user 
ID from the Real Name, Comment and Email Address in this form:
"Heinrich Heine (Der Dichter) < This e-mail address is being protected from spambots. You need JavaScript enabled to view it >"

Real name: Duplicity Backup
Email address: < This e-mail address is being protected from spambots. You need JavaScript enabled to view it >
Comment: Key for duplicity
You selected this USER-ID:
"Duplicity Backup (Key for duplicity) < This e-mail address is being protected from spambots. You need JavaScript enabled to view it >"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit?
Enter the requested details and then press O for Okay.
You need a Passphrase to protect your secret key.

Enter Passphrase:
Enter your prassphrase then re-enter again when prompted.

Your passphrase should be something long and complex. If you have a password management software on your PC like KeePass, Keepassx, Roboform, etc you can generate a password/passphrase with them and you can also store the password/passphrase there.
Also you can generate a password/passphrase with an online tool here. Anything will do, but make sure you remember it because you'll need it later.

gpg: key **9929DAB1** marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 2 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 2u
pub 1024D/9929DAB1 2007-11-15
Key fingerprint = 3378 8E93 4349 0E7F 44F3 7C81 2460 5A11 9929 DAB1
uid DuplicityBackup (Key for Duplicity) < This e-mail address is being protected from spambots. You need JavaScript enabled to view it >
sub 2048g/5385A6BB 2007-11-15
Make note of the key (in this case, 9929DAB1) as we’ll need that later too.

Remember to backup your GPG key pair somewhere safe and off the current machine. Without this key pair your backups are totally useless to you. This article shows the proper way to export (and import) your GPG key pair.

The backup wrapper script

This bash wrapper script (and the cron job) does a full backup monthly and incremental backups daily. It will also delete old backup sets after X months (you can add how many) and it also emails a log report each day giving some valuable statistics about your backup and reporting any errors.

You will need to have the following information handy to edit this backup script for your needs:

  • Your AWS Access Key ID
  • Your AWS Secret Access Key
  • Your GPG key
  • Your GPG key passphrase
  • A list of directories you want to back up
  • An email address to send the logs to
  • A unique name for an Amazon S3 bucket (the bucket will be created if it doesn't yet exist)

The script is as follows, you need to change the bits in bold at least but pay attention to all the variables as you may want to tweak them to suit your needs.

Note that includes/excludes work on a 'fist match' basis. So if you want to exclude something in a directory, you need to exclude the file/subdirectory before including the directory. For more info see the duplicity man pages.

Before you customize the script, insert your AWS Access Key ID, AWS Secret Access Key and GPG passphrase to 3 different, root owned, text files and chmod these files to 600. The wrapper script will take the information from these files.

Example location of these file, you can change it of course:

/root/files/aws-access-key-id

/root/files/aws-secret-access-key

/root/files/gpg-passphrase

The script:

#!/bin/bash
# Set up some variables for logging
LOGFILE="/var/log/backup.log"
DAILYLOGFILE="/var/log/backup.daily.log"
HOST=`hostname`
DATE=`date +%Y-%m-%d`
MAILADDR="backup@yourdomain"

# Clear the old daily log file
cat /dev/null > ${DAILYLOGFILE}

# Trace function for logging, don't change this
trace () {
stamp=`date +%Y-%m-%d_%H:%M:%S`
echo "$stamp: $*" >> ${DAILYLOGFILE}
}

# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID=$(cat /your/aws-acces-key-id-textfile);
export AWS_SECRET_ACCESS_KEY=$(cat /your/aws-secret-access-key-textfile);
export PASSPHRASE=$(cat /your/gpg-passphrase-textfile);

# Your GPG key
GPG_KEY=Your_GPG_Key;

# The source of your backup
SOURCE=/

# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST=s3+http://your_s3_bucket_name;

trace "Backup for local filesystem started"

trace "... removing old backups"

duplicity remove-older-than 3M ${DEST} >> ${DAILYLOGFILE} 2>&1

trace "... backing up filesystem"

duplicity \
--full-if-older-than 1M \
--encrypt-key=${GPG_KEY} \
--sign-key=${GPG_KEY} \
--volsize=250 \
--include=/etc \
--include=/home \
--include=/root \
--include=/var/lib/mysql \
--exclude=/** \
${SOURCE} ${DEST} >> ${DAILYLOGFILE} 2>&1

trace "Backup for local filesystem complete"
trace "------------------------------------"

# Send the daily log file by email
cat "$DAILYLOGFILE" | mail -s "Duplicity Backup Log for $HOST - $DATE" ./.
$MAILADDR

# Append the daily log file to the main log file
cat "$DAILYLOGFILE" >> $LOGFILE

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=

./. - there is no linebreak!!! Sorry, but there is no sufficient space.

Save the script somewhere and give it an appropriate name e.g. /usr/bin/duplicity-backup or /root/scripts/duplicity-backup and make sure to chmod the script to 700 . Run the script as a test. If it works good add it to cron as a daily (root) cron job:

crontab -e
and then add a line something like this:
0 0 * * * /path/your/script
Which will do a backup at midnight every day.

The restore wrapper script

Clearly we need a way to restore from a backup, so use the following script to do just that:
#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID=$(cat /your/aws-acces-key-id-textfile);
export AWS_SECRET_ACCESS_KEY=$(cat /your/aws-secret-access-key-textfile);
export PASSPHRASE=$(cat /your/gpg-passphrase-textfile);

# Your GPG key
GPG_KEY=YOUR_GPG_KEY

# The destination
DEST="s3+http://your_s3_bucket_name"

if [ $# -lt 3 ]; then echo "Usage $0 <date> <file> <restore-to>"; exit; fi

duplicity \
--encrypt-key=${GPG_KEY} \
--sign-key=${GPG_KEY} \
--file-to-restore $2 \
--restore-time $1 \
${DEST} $3

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
Save the script somewhere e.g. /usr/bin/duplicity-restore or /root/scripts/duplicity-restore and chmod it to 700.

To do a restore simply invoke the script as follows:
duplicity-restore <date> <file> <restore-to>

Some notes on usage:

Paths are relative not absolute. So /home/username would be backed up as home/username.You can restore whole directories but the destination needs to exist first. I suggest restore your data to a temporary location and check the restored files before you overwrite your existing files with it (except if you have a data loss).

Example usage:

duplicity-restore "2009-06-27" home/username /tmp/username

Test the restore script!

That's all, go to sleep. Smile

If you think "Oh man, no thanks, I search a full ready to work script" , take a look around in this python script, it does the same work as the aboves, but you only have to download and install it. I never used it, but it seems a good working script.

Source of this guide

I read and used the following guides, howtos, tips, ideas for my own guide:

http://www.brainonfire.net/blog/remote-encrypted-backup-duplicity-amazon-s3/

http://www.randys.org/2007/11/16/how-to-automated-backups-to-amazon-s-s3-with-duplicity/

https://help.ubuntu.com/community/DuplicityBackupHowto

http://www.cenolan.com/2008/12/how-to-incremental-daily-backups-amazon-s3-duplicity/

Thanks (and Oscar) goes to these guys.

And of course, very thanks to the duplicity developers. Read a lot the duplicity man pages.

Last Updated ( Sunday, 28 June 2009 10:39 )
 

Visitor Map

Recent Readers