May 2008 Archives
Using S3 with rdup
Well after some playing (and signing up) with S3, I'm now confident that rdup can be used with S3. The basic mode of operation is as follows:
- Create a backup with
rdup -c - Save the output somewhere
- Upload it to Amazon
Backup:
rdup -c /dev/null dirs/to/backup > rdup-bin-file
s3-upload rdup-bin-file
- Download the backup file
- Put it through
rdup-snap -c
Restore:
s3-down rdup-bin-file
cat rdup-bin-file | rdup-snap -c -b /tmp/restore
How this works under Ubuntu/Debian with Perl
The Amazon Perl API is already available with Ubuntu. Just
apt-get install libnet-amazon-s3-perl libwww-perl libxml-simple-perl
and you will get all the stuff you need to start developing. Note that
perldoc Net::Amazon::S3 is something you really need to read. Also
be sure to read the S3 docs from Amazon.
Generic code
S3 works with buckets in which you can place objects (which should be smaller than 5 GB). So before you can do anything you will need to create a bucket.
All Perl files that follow need to have the following code at the start
#!/usr/bin/perl
use strict;
use warnings;
use Net::Amazon::S3;
Next you will need to create a connection with Amazon, this done with your 'access-key-id' and your 'secret-access-key' which I'm not going to tell of course. Code look like this
my $s3 = Net::Amazon::S3->new(
{
aws_access_key_id => "my_public_key"
aws_secret_access_key => "my_private_key",
}
);
Create a bucket
This step needs to be done once. In code:
my $response = $s3->add_bucket( { bucket => "rdup-test1" } ) or
die $s3->err . ": " . $s3->errstr;
Upload a file to Amazon
Code:
my $localname = $ARGV[0];
my $bucket = $s3->bucket("rdup-test1");
$bucket->add_key_filename('rdup-backup-200805-23.bin', $localname,
{ content_type => 'application/binary', },
) or die $s3->err . ": " . $s3->errstr;
The arguments for add_key_filename are
- remote filename
- local filename
- hash reference; in this case with the
content_type
Download a file from Amazon
Code:
my $localname = $ARGV[0];
my $bucket = $s3->bucket("rdup-test1");
$response = $bucket->get_key_filename('rdup-backup-200805-23.bin',
'GET', $localname) or
die $s3->err . ": " . $s3->errstr;
The arguments for get_key_filename are:
- remote filename
- method, any else than
GETallowed? - local filename
Webinterfaces
There are also S3 web interfaces out there which will let you point and click to manage your files. I have no experience with any of them, but they seem like a nice idea. You can at least manage your backup files and buckets from your browser; rdup will never have fancy support for this as it has nothing to do with backing up.
Conclusion
You want cheap, off-site backups? With encryption. You can have to now.
For rdup-0.6.1 I will add an rdup-s3 tool to automate some of these
things. For now you will need to write your own stuff and use the
following commands to backup
rdup -c /dev/null ~/mydir | rdup-crypt my-key > rdup-crypted.output
<upload to amazon>
<ready>
And these to restore
<download from amazon>
cat rdup-crypted.output | rdup-crypt -d my-key | rdup-snap -c -b \
/tmp/restore
Ain't that cool?!
Building rdup under Ubuntu/Debian
rdup comes with a debian/ directory which allows you to build a .deb
package for your own need, just like source rpms (there is a .spec file
included also, if you would like to build an rpm). Currently rdup isn't
carried by Debian, but this may change in the future.
Build dependencies
There are some dependencies that you must install before you can build/compile rdup on your system. You'll absolutely need to
apt-get install fakeroot dpkg-dev automake autoconf libglib2.0-dev \
libfile-copy-recursive-perl make gcc
I'm not 100% sure this is all you need: basically you need to be able to compile a C program.
Now you can download rdup, unpack it and cd to the rdup's build
directory. In there you can give the following command
dpkg-buildpackage -rfakeroot -uc -us -b
This should create a debian package; if not, please let me know. Then you should be able to
cd .. ; dpkg -i rdup-<version>.deb
This will install rdup on your Debian/Ubuntu system.
Amazon S3 and rdup
I was talking to a friend yesterday and he is using Aamazon's S3 and other stuff for (among other things) web development. What is S3? Basically a file storage with a web interface. You can put file there, but not whole directory tree (i.e. not a full blown Unix filesystem). You must create a bucket which gets a unique ID and then you can create objects in there which are up to 5 GB in size.
Turns out there are already backup programs
out there
which us S3. I was thinking if rdup could be modified to also use this
service.
And then I hit me
Nothing needs to be changed in rdup. The output of rdup -c is
exactly what you want to put on S3 (encrypted of course). I have no
code yet, but it will work along the following lines: ($ is the prompt)
$ rdup -c <whatever-x> | rdup-crypt > mega-encrypted-rdup-file
$ w3c -put -dest <whatever-y> mega-encrypted-rdup-file
Restore works the other way around.
rdup 0.6.0-rc1 released
I've released rdup 0.6.0-rc1
This release incorporates a lot of new features, such as
- hardlink support
- GPG encryption support in rdup-simple
- i18n support (no translations as of yet)
I'm releasing this to prepare for a full blown 0.6.0 release. This will happen in a week, probably. If you find any issues with this release, please let me know.
writing rdup scripts
A backup script that can be used in conjunction with rdup needs to be able to grok the
output listed below (obviously). Writing such a script isn't really that difficult. You can look at the
examples already used by rdup. See rdup-snap for instance. In case you want to write your
own scripts you will have keep in mind the following things.
The line numbers given all refer to rdup-snap version 0.6.0 in rdup
distribution. You can see the file
online
Sample rdup output:
+d 0751 1000 1000 11 0 /home/miekg/bin
+h 0700 1000 1000 44 21 /home/miekg/bin/acx2 -> /home/miekg/bin/cx
+h 0700 1000 1000 51 28 /home/miekg/bin/cx-hardlink -> /home/miekg/bin/cx
+l 0777 1000 1000 24 18 /home/miekg/bin/t -> tt
+- 0775 1000 1000 21 174 /home/miekg/bin/wifi
Add or remove
In rdup's default output a '+' as the first character signals the object (file, directory or link) named in this line should be added to the backup. This case is the most complicated one, so it will be handled in its own section.
When a line starts with a '-' the object should be removed, in the case
of a directory the whole directory (and everything below it) is removed.
All other cases are handled with a simple unlink (rm).
See lines 179 - 187 in rdup-snap.
Adding objects
When objects are added you need to look at the type of the object, which is either a directory, a normal file, a symlink or a hardlink.
Removing existing files
When we've reached this point we have to look at our files in the backup directory (depending on the setup we may have copied the backup from yesterday to today and the script is working on those files).
If the object we want to add already exists in the backup, we need to
remove it and replace it with the current one. Usually we can just
unlink the object, unless it is a directory and that case we need
to rm -rf it. See lines 122 - 135 in rdup-snap.
Next we need to split on the actual type: files, directory, symlink and hardlink.
Files
Basically just copy the from the file system over to the backup file system.
In rdup-snap File::copy is used for this. See line 145.
Sample output:
+- 0775 1000 1000 21 174 /home/miekg/bin/wifi
Directory
The directory size as printed by rdup is always zero as there no
content for a directory. Just mkdir $directory, see line 151 in
rdup-snap.
Example of a directory output:
+d 0751 1000 1000 11 0 /home/miekg/bin
Note: when a backup is made by mortal users (i.e. non root) the
directory permissions need to be set in such a way that rdup
can still access this directory in case it needs to place more files
in this directory. If this is detected a chmod u+rwx is issued.
Symbolic link
A symbolic link can be created as-is. The target need not yet to exist in the backup file system. The target of the link might even fall outside the directories that are backed up.
In any case the parsing of the output of rdup is slight different. The file size (%s) of a symlink is not relevant, so rdup overloads it to mean the path length of the symlink name. The path length (%n) is extended to include the link target also. So in this sample output:
+l 0777 1000 1000 24 18 /home/miekg/bin/t -> tt
The 24 is the combined length of /miekg/miekg/bin/t -> tt and 18 is the
length of /home/miekg/bin/t. The symlink is thus a substring
from the pathname up to the 18th character. From the 18+4th till
the last position is the target name of the symlink. See the lines 114 -
120 for this parsing in rdup-snap.
With the link name and the target name we can re-create the symlink in our backup file system.
See line 166 for the symlink creation.
Hard link
A hardlink look almost like a symlink in rdup's output, except for the 'h' as type. The link name and target name parsing is exactly the same as for symlinks, as you might spot from this sample output:
+h 0700 1000 1000 44 21 /home/miekg/bin/acx2 -> /home/miekg/bin/cx
The only problem with hardlinks is that you can not create them when the target is not available. To fix this all lines with type 'hardlink' are saved up (line 170) and the link creation is done after all files and directories are put in the backup file system. This post processing can be found at lines 71 - 76.
Note that hardlink support is new and will be available in the (yet) to be released rdup version 0.6.0

