Archive compression is a process of compressing one or multiple files and directories into a single archive file with a reduced size. The archive file can then be compressed using various algorithms such as gzip, tar, bzip2, etc. to save disk space and to make it easier to transfer files over the network. The archive file can be later decompressed and the original files can be restored.
gzip
gzip is a widely used compression tool in Linux and Unix systems that compresses a single file into a smaller file with a .gz extension. It uses the DEFLATE algorithm to achieve lossless data compression. The compressed file is smaller in size than the original file, making it easier to transfer or store. The original file can be decompressed to its original state using the gzip -d command.
The syntax for the gzip command is:
gzip [OPTIONS] [FILE_NAME]
Options:
- -1 to -9: compression level, 1 is the fastest and less compression, 9 is slowest with the most compression
- -d: decompress the file
- -f: force compression/decompression, even if the file is compressed/decompressed already
- -h: display help/usage information
- -k: keep the original file, do not delete it
- -r: process directories recursively -t: test compressed file for integrity
- -v: display verbose information
- -#: display the compression ratio for each file
Example:
You can compress a file using the gzip compression protocol named LZ77 using the gzip command. Here's the simplest usage:
This will create a compressed file "newfile.txt.gz" and delete the original file.
To prevent this, you can use the -c option and use output redirection to write the output to the "newfile.gz" file:
The -c option specifies that output will go to the standard output stream, leaving the original file intact, or you can use the -k option:
There are various levels of compression. The more the compression, the longer it will take to compress
(and decompress). Levels range from 1 (fastest, worst compression) to 9 (slowest, better compression), and the default is 6.
You can choose a specific level with the -<NUMBER> option:
tar
"tar" is a file archiving utility in Linux used for packaging multiple files into a single archive file (tar file). The tar archive file format is commonly used for backup purposes and for the distribution of multiple files as a single archive. Tar files can be compressed using gzip, bzip2, or other compression utilities. The basic syntax to create a tar archive is:
[OPTIONS] used in the above command:
- -c: create an archive
- -v: verbose output
- -f: file, used to specify the name of the archive file
To extract the contents of a tar archive:
tar -xvf archive.tar
Options used in the above command:- -x: extract files
- -v: verbose output
- -f: file, used to specify the name of the archive file
To extract them to a specific directory, use:
You can also just list the files contained in an archive:
'tar' is often used to create a compressed archive, gzipping the archive. This is done using the -z option:
This is just like creating a tar archive and then running gzip on it.
To unarchive a gzipped archive, you can use gunzip, or gzip -d, and then unarchive it, but tar -xf will recognize it's a gzipped archive and do it for you:
bzip2
'bzip2' is a lossless compression tool in Linux used to compress and decompress files. The compression method used by bzip2 is known as the Burrows-Wheeler algorithm and it provides better compression ratios than gzip. To use bzip2, one can run the following command in the terminal:
Syntax:
Options:
- -z: Compress the file
- -d: Decompress the file
- -k: Keep the original file after compression
- -v: Display the progress of compression
- -f: Force compression or decompression
- -t: Test the integrity of the compressed file
Example:
This will compress the file "file.txt" and create "file.bz2".
This will decompress the file "file.txt.bz2" and create "file.txt".
zip
The zip command is used to compress and archive files and directories in a zip archive. It works by compressing each file individually, then combining all the compressed files into a single archive file. The syntax for using the zip command is:
Where:
- options are various optional switches that can modify the behavior of the zip command.
- archive.zip is the name of the archive file that will be created.
- file1 [file2 ...] are the names of the files and/or directories to be included in the archive.
Some common options used with the zip command include:-r (or --recurse-paths):
- include all files and subdirectories within a specified directory.
- -9 (or --best): use the highest level of compression.
- -u (or --update): update an existing archive file, adding or replacing files as necessary.
For example:
To extract files from a zip archive, use the unzip command.
unzip
"unzip" is a command line utility in Unix-based systems that allows you to extract compressed archive files (with .zip extension) and restore the original files. The syntax is:
- -l: List the contents of an archive without actually extracting the files.
- -o: Overwrite files without asking for confirmation.
- -d: Specify the destination directory for the extracted files.
- -p: Extract files and preserve the original file permissions.
- -q: Quiet mode, do not display any messages or error messages.
If the [file(s) ...] argument is not specified, "unzip" will extract all the files in the archive. If an [destination] argument is not specified, "unzip" will extract the files to the current working directory.