Last updated 2011/09/02
Return to the Suffield Academy Network Documentation Homepage
A printable version of this document is also available.
Suffield Academy provides networked disk space for all of its users, and encourages its use as part of a regular backup strategy. We back up the entire disk array nightly to another machine as part of our disaster recovery plan.
Unfortunately, we do not have enough space to archive each of these full nightly backups. So, while we are protected against the server crashing, we do not have the ability to recover files from a particular point in time.
To remedy this problem, we designed a custom backup script to satisfy the following criteria:
We found that the best way to accomplish this was to use a collection of scripts to wrap existing utilities and provide the functionality we needed.
We use rsync
as the tool to perform the backups, and it is the
workhorse of our system. Rsync contains configurable settings to
allow the exclusion or inclusion of particular files, and we can
script the automatic exclusion of users who are over a particular
quota.
Meanwhile, Rsync also includes the ability to hard-link against existing files (a method later used by Apple's Time Machine), so that you can store full file system copies, but only have to use disk space for the files that have changed. While not quite as good as copy-on-write filesystems that operate at the block level, this approach is very portable across operating systems.
Finally, we wrap Rsync in custom scripts that archive completed backups, timestamp them, and weed out older backups to conserve space.
The final collection of scripts are written in Perl and Bash, with very few module dependencies. The scripts include full documentation and are configurable.
(This section deals with the design considerations for the script. If you just want to start using the script, skip down to the usage section.)
While searching for methods to create snapshot backups, we found an excellent strategy for backing up only incremental changes. It involves using hard links on the filesystem to store redundant (e.g., unchanged) files, and rsync to transfer only files that change between backup sessions. Please read the paper for more information; a discussion of the technique is beyond the scope of this document.
We investigated existing solutions that use this strategy (including a program called rsnapshot, which looked very promising. Based on the features and existing code base, we decided to adopt this strategy for our backups.
We would have liked to use an existing solution for backing up, but encountered a few significant issues that could not be resolved without writing custom code:
To attain these goals, we use Rsync 3 as the core of the backup system, and wrap it with shell scripts that provide the additional policies that we need to back up the correct information and retain it.
In the sections below, we discuss each of the issues in more detail.
Most Macintosh computers running OS X use the HFS+ filesystem for storing data. HFS+ has two features which frustrate file transfers, especially to non-Mac OS X systems: file metadata (especially type and creator codes), which have no counterpart on other file systems, and forked files, which split files into several unique parts. Because other filesystems do not use this approach, storing HFS+ data correctly becomes much more difficult.
There is a great utility called Backup Bouncer that automatically tests backup tools and their ability to preserve these sorts of data. Based on our testing, we discovered that a patched build of Rsync 3 provided the best retention of metadata while also supporting a very efficient transfer mode (differential copies with hard-links at the target).
To save space in our backups, we needed the ability to exclude files from the backup based upon a user's quota status. We do not enforce a hard quota limit on our fileserver (to allow users to store large files temporarily), but we didn't want to waste space with large files that didn't need to be backed up.
When backing up user home directories, the script communicates with our Open Directory server to find users that are over quota. If a user is over their quota, files are excluded from the backup (until the non-excluded files fit within their quota). When a user's files are excluded, their e-mail address is queried and the user is notified that certain files were not backed up.
Rsync can perform backups via the network, and we have designed our scripts to allow this behavior as well.
Because our script performs various housekeeping duties (rotating directories, locating old directories to link against, etc.), remote backups must adhere to a specific way of doing things in order to work properly.
We do this by using Rsync's daemon mode, and using a pre/post execution script. The script is automatically run by rsync before and after the transfer, and it takes care of creating and renaming directories, ensuring that transfer options are set correctly, and other housekeeping. Because it's run on the server side, the sender (client) simply uses regular rsync without needing to worry about the policies on the server.
The scripts depend on the following packages to work correctly:
wget 'http://rsync.samba.org/ftp/rsync/rsync-3.0.6.tar.gz' wget 'http://rsync.samba.org/ftp/rsync/rsync-patches-3.0.6.tar.gz' tar -zxf rsync-3.0.6.tar.gz tar -zxf rsync-patches-3.0.6.tar.gzThe final step above merges the patches into the main source tree. Next, move into the source tree and apply the Mac OS X metadata patches:
cd rsync-3.0.6 patch -p1 <patches/fileflags.diff patch -p1 <patches/crtimes.diffNow configure the sources:
./prepare-source ./configureAnd finally, build and install:
make sudo make installYou should be all set with a patched version of rsync (installed to
/usr/local/bin
by default). Run rsync --version
to confirm
the version and patches.
Rsync is run in "daemon mode" on the server you wish to use as the destination for backups. The daemon is normally started as root (so that it can bind to a privileged port), and then an alternate user is specified to own the tranfered files.
You can use the standard processes for starting a daemon on your system (init, daemontools, rc, etc). If you use Mac OS X as your server, we have Launchd plist files available for starting and running rsync. There are two files:
The first launches the rsync daemon, but only if the path where the backups go is present (we use an external disk for our backups, so we don't want the daemon to run if the disk isn't mounted).
The second watches the backup path and kills rsync if it is no longer available (the disk is unmounted). You'll need to customize the paths if you plan to use this functionality. If you don't need to worry about the path unmounting, you can just use the first script to keep the daemon alive.
Finally, Mac OS X users should ensure that the destination directory for their backups has Ignore Ownership and Permissions turned OFF. You can check this by choosing "Get Info..." on the destination volume. If privileges are not preserved, then rsync will assume that all the files have changed ownership (since, as far as it knows, they have), and every file will be retransmitted, making the backup non-incremental.
There are two main scripts to use as part of this system: one for the client (sender) and one for the server (receiver). Others are available for special tasks. Each script includes documentation in comment form, but we also have a brief introduction below:
rsyncd_prepost
(Download)rsync_snapshot
(Download)users_over_quota
(Download)
On the client side, there are no configuration files. All
configuration is done directly when you invoke rsync
, so you
should modify the script that uses rsync to have whatever
options/includes/excludes that you want.
On the server side, the rsyncd_prepost
script shares the same
configuration file as the rsync
daemon (typically,
/etc/rsyncd.conf
). Our script parses the rsyncd.conf
file to
get the same options that rsync uses. Additionally, you can specify
settings for the rsyncd_prepost
script by placing special comment
lines in the configuration file that our script understands (see below
for an example).
We've included a sample rsyncd.conf
file below. This file defines
a single transfer target called "test". We've included detailed
comments for each option so you know why the values are specified in
this particular format.
Note: the rsync daemon re-reads its configuration file for every connection, so it is not necessary to signal (e.g., HUP) the daemon if you change the config.
# /etc/rsyncd.conf # This option forces rsync to write its PID out to a file. Our # launchd watcher script uses this PID to send signals to rsync, so # this path should match that used by launchd. pid file = /var/run/rsyncd_snapshot.pid # Ordinarily, the user that starts the daemon process also owns all # the files from the transfer. We choose a non-root user here to own # all the files. The prepost script also uses these values when # creating and setting ownership on directories. # # See also "fake super" and "incoming chmod" below uid = rsync_user gid = rsync_user # Additional safety measure: change to the destination root of the # transfer before executing any operations. use chroot = yes # use syslog instead of plain file logging syslog facility = ftp # enable plain file logging for more detail and debugging #log file = /tmp/rsyncd.log #transfer logging = yes #log format = "%m:%u %h %o %f (%b/%l)" # The following file contains the usernames and passwords for # connections to the daemon. File format is username:password, one per # line. We use this to restrict transfer modules to specific users. secrets file = /etc/rsyncd.secrets # Normally, to preserve ownership and permissions, rsync must run as # root. However, by using "fake super", rsync stuffs all file # metadata into xattrs on the files and lets them be owned by a # non-root user. Additionally, this allows you to store metadata # not supported by the native file system. fake super = yes # When using fake super, you are not running the transfer daemon as # root. This means that certain operations on the destination files # can fail (such as setting attributes) if the permissions on the # files are not permissive enough. For example: if a file does not # have read permission for "user", when it is transferred to the # destination the rsync daemon user (who is not root) cannot read its # xattrs to determine if the file changed, resulting in an error. # # To work around this, we set a user-permissive umask (or chmod) on # the receiving side to guarantee that the rsync daemon can at the # very least read and write to files it owns. Be aware that when you # restore files back to a host, the permissions may be (slightly) more # permissive, though it's rare that you actually have a file that the # owner cannot read or write to... incoming chmod = u+rwX # We specify our pre/post script for all transfers, using this file # (rsyncd.conf) as the first argument so it can find all configuration # options. pre-xfer exec = /usr/local/bin/rsyncd_prepost /etc/rsyncd.conf post-xfer exec = /usr/local/bin/rsyncd_prepost /etc/rsyncd.conf # to prevent multiple connections overwriting each other, only allow # one connection at a time (note that you must specify a lock file for # each module, as the connection limit is enforced on a per-lock-file basis). max connections = 1 # allow shares to be writeable (otherwise it's hard to back up to them!) read only = no # by default, don't advertise module names list = no # Comments that begin with "rsyncd_prepost" are read by the prepost # script and used to set values in that script. Because they are # comments, they are ignored by the rsync daemon. # date format for snapshot directory names (per strftime()) # rsyncd_prepost: dateformat=%Y-%m-%d # The pre/post script renames backups by the date and time they # complete. Since backups take a variable amount of time to finish, # it can be helpful to round off the time to the nearest hour/day/etc. # Specify a time in seconds that you would like to round do (we # use 1 day, or 86400 seconds). # rsyncd_prepost: dateround=86400 # This is a module named "test". In general, you want one module per # set of files you're going to back up. Most of our modules are for a # single server, though some larger servers use multiple modules to # spread out the organization of the files. [test] comment = This is the comment # The path MUST always end in "rsync", and the level above # that should match the name of the module. The pre/post # script will create the module dir and any necessary subdirs # if you run the prepost script with the module name as the # last argument. path = /Volumes/encrypted/test/rsync # To limit the number of connections to a particular module, # you must specify a lock file that is unique to that module. # Otherwise, the shared connection limit is global (for all # modules) and you'll likely get conflicts. lock file = /var/run/rsync_snapshot_test.lock # List any users from /etc/rsyncd.secrets that should have # access to this module auth users = test # List any machines that should have access to this module # (typically, only the machines that are sending the backups) hosts allow = 192.0.2.1 # If desired, you can specify per-module options for the # pre/post script here as well. The lines below define how long # snapshots are kept, and how far apart in time they are spaced. # See the pre/post script for more details. # rsyncd_prepost: snapshotpreserve=60 30 # rsyncd_prepost: snapshotpreserve=120 60 # rsyncd_prepost: snapshotpreserve=300 -1
To recover files from the backup server, there are two basic options.
The first is to create a new module specification that includes the path to the snapshot directory you wish to recover from. You may specify an additional username or host restriction if you're recovering from a different host. Be sure to OMIT the "pre-xfer" and "post-xfer" lines from the config so the server doesn't think it's a live backup request.
However, that approach is cumbersome, and requires editing and saving files based on parameters at the time of recovery. An easier approach is to tunnel the rsync request via SSH and specify the path to recover from on the fly.
To do this, you need to have SSH access to the server, and that SSH user must have (at least) read permissions to the module folder you wish to restore from.
You can then use rsync tunneled through SSH to connect to the machine and restore the files. Consider the following example (which should be typed all on one line, or use continuation backslashes as we have here):
sudo /usr/local/bin/rsync -aNHAXv -e ssh \ --rsync-path="/usr/local/bin/rsync --fake-super" \ ssh_user@backups.example.com:/path/to/module/snapshot/2009-10-11/ \ /tmp/restore_to_here/
The rsync options (-aNHAX
) should match those used when you
created the backup from the sender. The -e ssh
tells rsync to use
the SSH tunnel instead of a direct connection. If your server
normally has the fake super option set in its rsyncd.conf
file, you need to tell the tunneled rsync daemon to turn it on as well
using the --rsync-path
option as we have. Finally, you specify
the full source and destination path. In the case of the most recent
backup, it will live in .../module/rsync/complete/
. Otherwise,
you can specify the snapshot to restore from with
.../module/snapshot/(date)/
as we have above.
Originally, we used our script with a server running Mac OS X, and used hard links to create the snapshot backups. Since that time, several other filesystems have come out (notably BTRFS and ZFS) that allow for easy snapshotting of an entire filesystem (or subdirectory), thus eliminating the need for hard links. This can speed up certain operations, especially calculating file system usage and deleting snapshots.
BTRFS is not yet stable enough for us; our script supports it, but as of this writing (Summer 2011) it had issues with storing all the metadata and filling up too quickly. Thus, we'e moved to ZFS.
You can read about the virtues of ZFS elsewhere. What follows is a rough outline of how to set up a FreeBSD server with ZFS to use with our script.
ZFS uses a lot of RAM, so get a server with plenty. Additionally, you can speed up some operations through caching, so you might consider having an SSD drive for an L2ARC. Our server is a 36-bay quad-core with 24GB of RAM. One drive boots the machine, one drive is a 120GB SSD, and the rest are for ZFS.
Obtain the installation media for FreeBSD (we use 8.2 in these instructions). Format the boot drive and use the recommended partitioning. Leave any additional drives unconfigured. Reboot into the installed system and ensure that everything is as you like it:
Start by populating (or getting the latest version of) the ports tree. Use "extract" for the first time, and "update" for all subsquent runs:
portsnap fetch extract portsnap fetch update
Then, install the ports we rely on (run
make && make install && make clean
in each of these directories):
/usr/ports/shells/bash /usr/ports/net/rsync /usr/ports/editors/emacs-nox11 /usr/ports/sysutils/screen /usr/ports/devel/subversion /usr/ports/net-mgmt/net-snmp /usr/ports/sysutils/apcupsd /usr/ports/lang/perl5.12 (included in subversion) /usr/port/sysutils/p5-Unix-Syslog
(You only need perl, p5-Unix-Syslog, and rsync for our script -- the others are packages we use at Suffield.)
To ensure that devices remain consistent across reboots, we partition and label all devices before creating a ZFS pool. This is a real concern; SCSI devices are enumerated at boot time according to how the controller spins them up, and that may change (at least, it did for us...) when the system boots. Because we're planning to encrypt the drives, having a non-encrypted header to identify the drives is helpful.
You can use glabel
to label raw disks, but we've chosen to
partition the drives so they have a consistent size, block alignment,
and partition table with label.
First, figure out what drives you have. SCSI subsystem users can say:
camcontrol devlist
That will show you the target and lun IDs, which should enable you to label them correctly. Pick a label scheme that will work with the physical setup of your machine. In our case, the chassis has 36 drive bays, with the possibility of expanding to external JBODs. We number the drives as "cXXdYY" (XX is 0 for all our current drives, as they're all in the chassis).
Assuming you have disks that are all uniform in the controller, run the following (WARNING: THIS WILL REPARTITION AND ERASE THE DRIVE):
gpart create -s GPT da1 gpart show -l da1
That will format the drive and then show the space available. Assuming the drive uses 512-byte sectors, find an even multiple of 8 (GELI recommends a 4096-byte sector size) that fits in the available space. If you want to be extra safe, leave some space at the end of the drive (this allows you to mix and match drive types that might have small size differences).
Here's an example: using gpart show
, you see that the available
512-byte block count is 3907029101. Divide this by 8 and truncate to get
488378637. Multiply this by 8 again to get your new "even multiple of
4096" sector count of 3907029096. That's your maximum size if you
wanted to use the whole disk. We'll shave 10MiB off the end to give
some wiggle room; thats 20480 512-byte blocks. Subtracting that from
our max value gives us a final block count of 3907008616.
You can now add a partition to the device with this size, and give it
a label that makes sense. Again, check dmesg
or camcontrol
to
make sure that the device gets a sane name (in the example below,
we're labelling a drive as "2", but it's seen by the kernel as "da7"):
gpart add -s 3907008616 -t freebsd-zfs -l c00d02 da7
If you have more disks that are the exact same size, go ahead and use
the same gpart create
and gpart add
commands on them with the
same sector size value.
When you're done, you should have a series of partitions listed in
/dev/gpt/
with your custom labels. This will give you a stable
set of names to use.
In case our backup server gets stolen, we encrypt the underlying devices using GELI (ZFS with native encryption isn't supported in our version of FreeBSD).
Create a home for the key files:
mkdir /root/geli-keys chmod 700 /root/geli-keys cd /root/geli-keys
Now, create a key for every disk device in your box (hopefully they're
sequentially numbered...). In our example, we'll generate keys for
/dev/gpt/c00d02
through /dev/gpt/c00d15
(these are the custom
labels we set up above).
for ((d=2; d<16; d=d+1)); do dev=$(printf 'c00d%02d' $d) dd if=/dev/random of=/root/geli-keys/${dev}.key bs=64 count=1 chmod 600 /root/geli-keys/${dev}.key done
Back these keys up!! Without them, you can't get your data back! If you want to store the key values as ASCII, do something like this:
for ((d=2; d<16; d=d+1)); do dev=$(printf 'c00d%02d' $d) perl -ne 'print unpack "H*",$_' < /root/geli-keys/${dev}.key echo -n ' ' md5 /root/geli-keys/${dev}.key done
To restore the backup, use perl -ne 'print pack "H*", "hex"' > key
Now you can initialize the encryption on each device (WARNING: this overwrites whatever's on those partitions, though they're probably empty since you just repartitioned the drives):
for ((d=2; d<16; d=d+1)); do dev=$(printf 'c00d%02d' $d) geli init -s 4096 -K /root/geli-keys/${dev}.key /dev/gpt/${dev} done
You'll be prompted for a passphrase. For simplicity's sake, use the same one for all the drives.
The -s 4096
uses a 4k sector size, which helps with the encryption
(fewer IV calls because the blocks are bigger). ZFS should detect
this larger sector size and work properly; to double-check, run
zdb | grep shift
after creating the zpool and confirm that the
value is 12
(corresponding to 4096) as opposed to 9
(512).
Now you'll need to attach the devices (and you'll need to do this every time the system boots):
for ((d=2; d<16; d=d+1)); do dev=$(printf 'c00d%02d' $d) geli attach -k /root/geli-keys/${dev}.key /dev/gpt/${dev} done
Note that you can script this (see below).
Mount all the encypted filesystems (you should have /dev/gpt/*.eli
devices for each drive available).
To prevent any sensitive data from leaking from RAM onto disk, you
should also encrypt swap. First, edit /etc/fstab
and add .eli
to the current swap partition:
# Device Mountpoint FStype Options Dump Pass# /dev/da0s1b.eli none swap sw 0 0
Then, add the following to /etc/rc.conf
to auto-encrypt any swap
partitions on the system (256 AES is the default):
geli_swap_flags="-s 4096 -d"
Now that you have the "raw" encrypted devices, you can create the zpool.
A quick digression: the L2ARC is a level-2 cache for (meta)data. ZFS has a built-in ARC using RAM, but a fast drive (such as an SSD) can be used to speed access by giving you much more space to store this information.
Our backup server primarily uses metadata to compare files (it doesn't bother syncing files if everything about them matches), so we set our L2ARC to only cache file metadata. The data itself isn't too important, as it will just get overwritten and then probably never accessed again.
The pool creation command in this section assumes a L2ARC device with metadata storage. If you don't want that, leave it out.
We don't encrypte the L2ARC, as it's only metadata. If that's sensitive information to you, go ahead and encrypt that as well.
Since these are just backups, we don't care about access times, so we turn that off. We try to save even more space by compressing the data. We make the snapshot directories visible so our maintenance scripts can see them (and we can easily pull data from them). Because we anticipate our workload to be metadata-intensive, we set both the primary (RAM) and secondary (L2ARC) caches to only handle metadata.
To create the pool (named "rsync"), run the following (one command):
zpool create \ -o autoreplace=on -o cachefile="/root/geli-keys/rsync.cachefile" \ -O atime=off -O compression=on -O snapdir=visible \ -O primarycache=metadata -O secondarycache=metadata \ rsync cache gpt/c00d01 spare gpt/c00d08.eli gpt/c00d15.eli \ raidz2 gpt/c00d02.eli gpt/c00d03.eli gpt/c00d04.eli \ gpt/c00d05.eli gpt/c00d06.eli gpt/c00d07.eli \ raidz2 gpt/c00d09.eli gpt/c00d10.eli gpt/c00d11.eli \ gpt/c00d12.eli gpt/c00d13.eli gpt/c00d14.eli
Note on compression: compression is good, but can cause performance degradation (especially when used in conjunction with GELI for encryption). Consider only using compression on the sub-volumes that would benefit from it (textual data).
You can add the -n
flag after create
to just see a preview of
what would change.
If the command succeeds, you should have a new zfs filesystem at
/rsync
!
If you reboot the server, you'll need to reconnect all of the encrypted devices. This is best done through scripted automation. The following script will attach the encrypted drives and then attempt to import (mount) the zfs pool:
#!/usr/bin/env bash # Mount encrypted ZFS partition after prompting for password PASSPHRASE='' read -rs -p "Enter ZFS Encrypted Pool Passphrase: " PASSPHRASE echo -e "\n\nMounting encrypted pool members..." for ((d=2; d<16; d=d+1)); do dev=$(printf 'c00d%02d' $d) echo " $dev..." echo -n $PASSPHRASE | geli attach -k "/root/geli-keys/${dev}.key" -j - "/dev/\ gpt/${dev}" if [ ! -e "/dev/gpt/${dev}.eli" ]; then echo "Device ${dev} did not mount encrypted disk properly!" exit 1; fi done echo -e "\nImporting pool 'rsync'..." zpool import rsync echo -e "\n\n" zpool list echo ""
The rsync
port installs an init script, but you must enable the
daemon by putting rsyncd_enable="YES"
in your /etc/rc.conf
file.
You may want to prevent rsyncd
from starting unless the zpool is
mounted. To do this, add the following to
/usr/local/etc/rc.d/rsyncd
, just above run_rc_command
at the
bottom of the file:
if ! zpool list -H rsync >/dev/null 2>&1; then echo "ZFS rsync pool not available; refusing to start rsyncd" rsyncd_enable="NO" fi
Download and install the rsyncd_prepost
script from this
distribution and install it on the filesystem. Then create an
rsyncd.secrets
file in /usr/local/etc
with passwords for the
different backup modules. Finally, create an rsyncd.conf
file in
/usr/local/etc/
that references the prepost script (and other
paths in /usr/local/etc
). It should configure "zfs" as the
snapshot type, and point to the prepost script and secrets file.
You should now be ready to create the directory structure for each of the modules. You can do this by grepping the configuration file and using the script to build the base directories:
for module in $(egrep '^\[' /usr/local/etc/rsyncd.conf | tr -d '[]'); do /usr/local/bin/rsyncd_prepost /usr/local/etc/rsyncd.conf $module done
The rsync destinations should now be set, and you should be ready to transfer backups to the machine!