Locally Backing Up Cluster Files

You can never have too many backups of your files! If your local computer has enough hard drive space, why not backup your cluster files there? You can check how big your files are on the cluster by

du -s ~

Here's how to perform routine backups using rsync to sync your local copy with what's on the cluster and cron to schedule the backups.

1. If you're using Windows, install cygwin (http://www.cygwin.com) and make sure vim, rsync, cron, cygrunsrv, and openssh are selected. If you're on OSX or Linux, these should already be installed. Open a cygwin prompt (should be in your Start Menu) and run

cron-config

You will get a series of prompts, enter: 'yes', <return>, 'yes', <your password>, <your password again>, 'yes'.

2. Open up a terminal window (cygwin prompt in Windows) and create a bash script that will execute the backup. Here's mine:

#!bin/bash

echo "#######################quest home#######################"

rsync -avze ssh --delete jes786@quest.it.northwestern.edu:~/ /cygdrive/c/Users/James/Documents/Cluster_Backup/quest/home/

echo -e "\n\n\n#######################quest b1004#######################"

rsync -avze ssh --delete jes786@quest.it.northwestern.edu:/projects/b1004/jes786/ /cygdrive/c/Users/James/Documents/Cluster_Backup/quest/b1004/

echo -e "\n\n\n#######################palestrina#######################"

rsync -avze ssh --delete jsaal531@palestrina.northwestern.edu:~/ /cygdrive/c/Users/James/Documents/Cluster_Backup/palestrina/

date

In my script I'm backing up my home directory on Quest, my b1004 directory on Quest, and my home directory on Palestrina, each with a separate call to rsync. The -a option employs archive mode, which essentially makes the backup identical to the source after rsync is done, -v is verbose mode, -z compresses files during transfer, and -e executes the ssh command, allowing rsync to access the cluster. The --delete option will delete files in the backup that don't exist on the cluster, making your copy truly identical to the source. Lastly, enter the directory on the cluster you want to backup and then the location of the local backup. DO NOT MIX THESE UP!!!!!! Otherwise you will delete all your files on the cluster!

Edit my script with your usernames and directories. Then try running the script to make sure it works. It will probably take a while the first time while it copies all your files over. It will normally not take so long as rsync only copies over files that have changed or been created since the last backup.

3. Run crontab to edit the cron file and setup the routine backups

crontab -e

And add this to the cron file:

#!/bin/bash

PATH=/usr/bin:/usr/sbin:.

0 0 * * * /cygdrive/c/Users/James/Documents/Cluster_Backup/rsync_clusters > /cyg

drive/c/Users/James/Documents/Cluster_Backup/rsync_clusters.log

The first five numbers/asterisks sets the time to run the script. Details on this crazy format can be found here. My setting performs the backup daily at midnight. After the time setting is the command to be run. In my case, I run the backup script and output the stdout to a log file. change your paths accordingly.

Then you should be all set! Make sure the backup is running correctly by checking the log and your files after the backup has run for the first scheduled time.