Tar Tutorial

From Center for Cognitive Neuroscience
Revision as of 03:10, 16 January 2014 by Ccn admin (talk | contribs) (23 revisions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The "I just want something right now!" Section

Pros and Cons

Pros
Easy, quick, simple, minimal reading
Cons
If done regularly can take up a lot of space and requires more time spent actually backing up. Also, if the naming convention of files is not clear and concise can be difficult to determine which backup contains the file you want.

Extended Attributes

Disable them

Without Compression

$ /usr/bin/tar -cvf myTarNameThatICanNameAnythingICouldPossiblyWant.tar /path/to/directory/I/want/to/tar/up/goes/here

With Compression

$ /usr/bin/tar -cjvf myTarNameThatICanNameAnythingICouldPossiblyWant.tbz2 /path/to/directory/I/want/to/tar/up/goes/here
  • Please note that /path/to/directory/I/want/to/tar/up/goes/here can be an absolute or relative path.

Key Concepts

Differential Backup
a differential backup means that only the files that have changed since the last snapshot are backed up
Snapshot
think of a snapshot in terms of photography. It's a "picture" of how your files looked at a certain time. This picture can be compared to the current state of your files to determine which ones have changed and need to be backed up
Tar
tar is a GNU utility that can take many different files and store them in a single, large file or split across multiple files of a preset size (say for backing up to DVD)
Compression
compression is the process of reducing the size of a file while not losing any information. Depending on how much you attempt to compress the files, it could take a considerable amount of processing time over simply archiving them.
Archive
the process of taking data off line for long term storage.
Path
location of a file on a computer system
Extended Attributes
If you actually care what they are, please see Extended Attributes. If you don't care, just take my word for if you are archiving data on osx, you likely want to disable these. For how to do so, please see Disabling Extended Attributes on OSX

Key Tar'isms

  • Tar removes the prepending / in an Absolute Path. This means it will not automatically restore the files to their original location, but merely to their current location. If you provide tar with an absolute path to your backup directory, to restore it you must extract from the root directory. If tar is given a Relative Path, then you must extract the tar within the same directory you created the tar to restore the files to their original location.
  • Listing the contents of a non-differential tar is done as follows:
$ /usr/bin/tar --list --file=myTar.tar 
  • Listing the contents of a differential tar is done as follows:
$ /usr/bin/tar --list --file=myTar.tar --listed-incremental=mySpar.spar
  • Tar can only support file names of up to 100 characters. If you have super long file paths, tar will error out with something to the tune of:
/usr/bin/tar: <super long file name>: file name too long to be stored in a GNU header

There is a fix to this issue. Say your backing up a path such as /Volumes/username/work/my_data/raw_data/ and the only thing you really want is the raw_data. You can strip X number of preceding directory names with the --strip-components command (man tar). To strip out the /Volumes/username/work/my_data portion of this file path, it would look like this:

/usr/bin/tar --create --file=myTar.tar --strip-components 4 /Volumes/username/work/my_data/raw_data/

When myTar.tar is extracted, it will only be the raw_data directory.

Preparation

Disabling Extended Attributes on OSX

If you actually care what they are, please see Extended Attributes. If you don't care, just take my word for if you are archiving data on osx, you likely want to disable these.

For Tiger:

export COPY_EXTENDED_ATTRIBUTES_DISABLE=true

For Leopard:

export COPYFILE_DISABLE=true

Differential Tar Without Compression

The first example explains the process of how to do an differential backup using tar with no compression. This will be the fastest way to archive your files, but since no compression is used it will also take up the most space. This is the ideal way of creating backups of your live data, such as a weekly backup. More on that in the section Good Backup Practices

Pros and Cons

Pros
Quick, saves more space then non-differentials
Cons
Must read this section, doesn't save as much space as compression, requires a bit more effort and forethought in setup, and will probably require a bit of practice on test directories to become comfortable with the process and what to expect.

Using a Mounted Drive

One way to do this is to mount your data drive and then perform the backup locally. This is most likely your miles Songbook data drive. Please refer to Mount AFP Share for details on mounting drives.

After this is done, you must know the following information

  • Path to your data drive. It will be in /Volumes/share_name where share_name is the name of the drive you chose to mount in Finder
  • Path to your backup directory. This may be an external fire wire drive found in /Volumes or it might just be to a /Users/you/Documents/Backups directory on your machine. Either way take note of its Full Path.

We have to tell tar two important things besides our paths. The first is what we want to name our backup file and the second is what Snapshot we wish to make a backup against.

Do not name your backup file the same as older backup files 

unless you wish to overwrite them!!

Examples

After mounting my home directory on miles (found at /Volumes/username), I wish to make an initial backup to my fire wire drive mounted on /Volumes/backups.

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.1.tar --listed-incremental=/Volumes/backups/my_home.snar /Volumes/username

The result will be two files created in your /Volumes/backups.

  • my_home.1.tar This is the actual backup tar file. It holds all the data, we named it ".1." because it is our first backup and so contains all the data held in my home directory
  • my_home.snar This is the Snapshot file we spoke of. It just contains data that will let tar know which files have changed the next time we do a backup

To create a second backup that only contains the files that have either been changed or added since we created my_home.1.tar backup, we do the following.

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.2.tar --listed-incremental=/Volumes/backups/my_home.snar /Volumes/username

This will compare the Snapshot file my_home.snar with the files in /Volumes/username and only backup those that have been changed or added since my_home.1.tar was made.

To create a third backup that only contains the files that have either been changed or added since we created my_home.2.tar backup, we do the following.

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.3.tar --listed-incremental=/Volumes/backups/my_home.snar /Volumes/username

This will compare the Snapshot file my_home.snar with the files in /Volumes/username and only backup those that have been changed or added since my_home.2.tar was created.

Restoring Incremental Tar Files

The restore process is similar to the creation process. It will restore all the files in the archive to the current directory. You must still designate the associated Snapshot file when un-tarring. However, to restore the entire backed up directory to the exact state it was at the time of an incremental additionally requires all previous backups. This is due to each incremental only containing the changed files. This is best understood by example.To restore everything back to exactly how it was when I made my_home.1.tar, we would execute:

$ /usr/bin/tar --extract --listed-incremental=/Volumes/backups/my_home.snar --file=/Volumes/backups/my_home.1.tar

This would place a directory named Volumes/username and all subdirectories/files in the current directory. Notice that the preceding / is stripped from the tar path name.

To restore the files to their exact state at the point that my_home.2.tar was made, we would:

$ /usr/bin/tar --extract --listed-incremental=/Volumes/backups/my_home.snar --file=/Volumes/backups/my_home.1.tar
$ /usr/bin/tar --extract --listed-incremental=/Volumes/backups/my_home.snar --file=/Volumes/backups/my_home.2.tar

And finally, to restore to the exact state at the time that my_home.3.tar was made, we would:

$ /usr/bin/tar --extract --listed-incremental=/Volumes/backups/my_home.snar --file=/Volumes/backups/my_home.1.tar
$ /usr/bin/tar --extract --listed-incremental=/Volumes/backups/my_home.snar --file=/Volumes/backups/my_home.2.tar
$ /usr/bin/tar --extract --listed-incremental=/Volumes/backups/my_home.snar --file=/Volumes/backups/my_home.3.tar

This is done continuously from oldest original Snapshot time to most current state you wish to restore.

Please make special note that this will restore the *exact* state of the file system. 

If a file existed at the time my_home.1.tar was created, but then deleted before 

my_home.2.tar was created then restoring up to my_home.2.tar will remove that file.

Incremental Tars Split at 4GB Each for Storage to Media

Within the UNIX world, we call this a "Multi-Part Tar Archive". I chose the 4GB mark size since it is assumed the storage media of choice would be a DVD. The process is exactly as outlined under the section Tar Tutorial#Differential Tar Without Compression with two new options:

-M (--multi-volume)
tells tar to split the volume into parts
-L (--tape-length=size)
designates the length in kilobytes (1024 bytes). For a DVD, this would be approximately 4194304

As with incremental backups, it's good to use a descriptive file naming pattern. For multi volume tars, I designate the volume number by appending part0X to each file where X represents the volume order. An example using the above commands in conjunction with incremental tar:

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.1.part01.tar \
> --multi-volume --tape-length=4194304 \ 
> --listed-incremental=/Volumes/backups/my_home.snar /Volumes/username

The Multi Volume Interactive Prompt

Each time tar reaches the designated maximum size, it prompts you for some information before continuing. The prompt looks like this:

Prepare volume #2 for `./backup/my_home.1.part01.tar' and hit return:

This prompt takes the following responses:

?
Request tar to explain possible responses
q
request tar to exit immediately
n file-name
request that tar write the next archive using file name file-name
y
request that the next writing begin

So for our example archive, the process might look like this:

Prepare volume #2 for `my_home.1.part01.tar' and hit return: n my_home.1.part02.tar  
Prepare volume #2 for `my_home.1.part02.tar' and hit return: y

Notice that we gave the argument 'n' then designated the next filename to be my_home.1.part02.tar. Differentiating it from the first file of my_home.1.part01.tar. Tar will go through this prompt sequence for each archive file it needs to create.

If you provide the same name or simply enter 'y', the original file will be overwritten!!

Restoring Multi Volume Incremental Tar Archives

Restoration is done exactly as you would with Restoring Incremental Tar Files with the slight change of adding the --multi-volume switch. As with creation, a request for subsequent volume names is presented to the user. As before, each subsequent volume name is entered and then confirmed with 'y'. A typical extraction might look like:

 $ /usr/bin/tar --extract --multi-volume --listed-incremental=/Volumes/backups/my_home.snar --file=/Volumes/backups/my_home.1.part01.tar

Which would produce a prompt/response:

Prepare volume #2 for `my_home.1.part01.tar' and hit return: n my_home.1.part02.tar  
Prepare volume #2 for `my_home.1.part02.tar' and hit return: y

Compressing Tar Archives to Maximize Space

Tar has options for compressing data to varying degrees. Doing so can reduce the amount of space your archives take up. However, it can also take a heavy toll on system resources. It is highly discouraged to run tar and especially compressed tar on community resources as it will very likely negatively impact everyone else sharing that resource. Regardless of the method you choose for archiving your data, compression is added appending one of two arguments (just pick one):

For Gzip format
-z, --gzip
For Bzip2 format
-j, --bzip2

This argument must be appended at time of creation and when extracting or listing the archives. It is also canonical to use the extension of .tgz or .tbz2 respectively in place of the normal .tar for archive names.

Good Backup Practices

When should I use the Differential Tar Without Compression method?

This method is ideal for doing monthly full and weekly differential backups and doing them quickly. For example, on the first of every month you would perform an initial full backup then weekly a incremental based on that full.

Here's a decent method for managing that:

January's full backup:

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.2008-01-07.tar --listed-incremental=/Volumes/backups/my_home.2008-01.snar /Volumes/username

January's first incremental:

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.2008-01-15.tar --listed-incremental=/Volumes/backups/my_home.2008-01.snar /Volumes/username

January's third incremental:

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.2008-01-22.tar --listed-incremental=/Volumes/backups/my_home.2008-01.snar /Volumes/username

January's fourth incremental:

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.2008-01-28.tar --listed-incremental=/Volumes/backups/my_home.2008-01.snar /Volumes/username

February's full backup:

$ /usr/bin/tar --create --file=/Volumes/backups/my_home.2008-02-07.tar --listed-incremental=/Volumes/backups/my_home.2008-02.snar /Volumes/username

Notice that we changed the dates to the next month in February and we will do the same for subsequent months. This will create 4 backup files for each month, one full backup and 3 incremental. As subsequent months roll by, you may choose to delete the older backups or burn them to DVD for removal to make room for newer ones.

An example of what the command would look like implementing the compression and multi-volume techniques:

 $ /usr/bin/tar --create --multi-volume --tape-length=4194304 --bzip2 --file=/Volumes/backups/my_home.2008-02-07.tar \
 > --listed-incremental=/Volumes/backups/my_home.2008-02.snar /Volumes/username
 It's not adviseable to continuously do incremental backups based on one, very old full 
 
 backup. Rather, do a full every X days/weeks and incrementals in between based on that full. 
 
 This is why we create a fresh, full backup every month in the example

What's the best way to create a permanent archive and when should I do it?

An archive of file should be done when data is no longer being changed, modified, or used and will no longer be needed for the foreseeable future (e.g. the next 6 months, a year, or more). The best way to do this is to an external disk or DVD. If you choose to backup to a DVD, you should take a look at the Tar Tutorial#Splitting Tars for Storage to Media section.

How safe will my data be if I do all this?

Well, this is a tricky question. Most people don't use DVD's or single disks for backing up important data so there isn't a lot of research investigating their overall effectiveness. If we ignore things like misuse, abuse, sloppy filing, etc. they could be on par with Tape Media (Tar was originally designed for tape media).

Tape media recovery of data can fail as high as 15% of the time. The good news is if you make duplicates of the tape, it reduces the chance of failed recovery to around 2%'sh'. So if the data is very important, making a duplicate of the DVD archive is a very smart thing to do as it vastly increases the chances of data recovery.

Regardless, one thing that is very important to keep in mind is that the data will be leaps and bounds more safe then if nothing is backed up at all.

This seems like a lot of work to manage

That it does, backups take a significant investment of time and mental resources when done manually. It's best to put a decent amount of effort in designing a predictable, logical, and consistently organized data/directory structure and then automating many of the tasks via shell scripts.

External Links