Needless to say, backups should be considered a cornerstone of your server security policy. There are a number of tools which can help us with this, but I wanted to cover one very simple and fundamental command line tool to begin with, tar.
Tar is an acronym for tape archive, and has been one of the basic inbuilt linux tools for the longest time. What it does is take a collection of files, or directories, and combine them all into a single file, called a tarball. Tar does not compress the files, so it is usually used with another compression tool for this. There are a number of tools to choose from, gzip, bzip2, 7zip, rzip, etc., and the tradeoffs to consider are the compression ratio versus time, i.e. that is how well the tar file can be compressed down, giving you a smaller file, against how long it will take to compress & decompress. Below is a graphic demonstrating this principle.
For our purposes, I am going to use the well-established gzip compression tool. To begin with, I am going to tar the home directory into a file called home_files.tar. Move to the /tmp directory and run tar with the ‘cvzf’ flags. This specifies that you want to create a tarball (c), enable verbose mode (v), using gzip compression (z) and you are specifying the file name (f):
cd /tmp tar cvzf home_files.tar.gz /home/jupiter
To exclude specific file types, music files (.mp3) for instance, you would add the –exclude parameter:
tar czvf home_files.tar.gz /home/jupiter --exclude='mp3'
You can then move this file to a USB drive or back it up to your Amazon S3 share.
List and Extract Contents from a Tarball
To list the contents of a tgz file, use the t parameter.You are probably going to need to pipe this through the less command in order to read through the contents one screen at a time
tar tvf home.tgz | less
To extract/decompress a tar.gz file, run:
tar xvf home_files.tgz
And to extract only a single file (file.txt) from the tarball, run:
tar zxvf home.tgz file.txt
A very handy use of this is in scripting a daily backup of a directory, i.e. your web directory. Note that this will do a full backup, excluding any file types you specify, and is not capable of incremental backups. To use this, open a new script:
cd /home/user/scripts nano daily_web_backup.sh
And enter the following:
#!/bin/bash # Script to backup web directories, excluding file types listed in FILE.txt DATE = $(date +"%Y-%m-%d”) tar czf /home/user/backups/$DATE-web-directory.tgz --exclude-from=’/home/user/scripts/FILE.txt’
Then set this to run daily, by either adding an entry to your crontab:
or move it to your /etc/cron.daily folder. When you add it to your crontab you can control the time at which it runs, whereas moving it to /etc/cron.daily runs it at X each day. Note there are also /etc/cron.weekly and /etc/cron.monthly directories which will run the script according to those schedules.
A good idea might be to automate it as much as possible. Maybe have your script check for tarballs older than 5 days old in the backup directory, and delete those. Then after you create your tarball, have it run s3cmd to synchronise with your Amazon S3 bucket. That way you always have onsite and off site backups, without taking up too much space:
#!/bin/bash # if … - check for files in ~/backups where DATE > 5 days ago…? DATE = $(date +"%Y-%m-%d”) tar czf /home/user/backups/$DATE-web-directory.tgz --exclude-from=’/home/user/scripts/FILE.txt’ s3cmd sync /home/user/backups s3://my_s3_bucket
Below is a table showing the most common commands you will need to use.
|c||- create archive file||W||- verify archive file|
|C||- specify destination folder to extract archive to||x||- extract archive file|
|f||- filename of archive file||z||- filter archive through gzip|
|j||- filter archive through bzip2||wildcards||– specify pattern in tar command|
|r||- append (add) files/directories to existing archive file||–exclude='PATTERN'||- exclude '*.iso' or './folder'. Seperate statement needed per file/folder type|
|t||- view contents of archive file||–exclude-from=/path/to/file||- use exclusions file with one pattern per line|
|v||- show progress of file archiving|
|Create||tar cvf archive.tar /home/public_html|
|Extract||tar xvf archive.tar [-C /home/public_html/videos/]|
|List Contents||tar tvf archive.tar|
|Extract Single File||tar xvf archive.tar file.sh, OR, tar –extract –file= archive.tar file.sh|
|Extract Multiple Files||tar xvf archive.tar "file 1" "file 2"|
|Extract Group of Files using Wildcard||tar xvf archive.tar –wildcards '*.php'|
|Add Files/Directories to Existing Archive||tar rvf archive.tar file.txt|
|Verify Archive||tar tvfW archive.tar|
|Create||tar cvzf archive.tar.gz /home/public_html|
|Extract||tar xvf archive.tar.gz [-C /home/public_html/videos/]|
|List Contents||tar tvf archive.tar.gz|
|Extract Single File||tar zxvf archive.tar.gz file.xml, OR, tar –extract –file= archive.tar.gz file.xml|
|Extract Multiple Files||tar zxvf archive.tar.gz "file 1" "file 2"|
|Extract Group of Files using Wildcard||tar zxvf archive.tar.gz –wildcards '*.php'|