14

What is the difference between Archiving and compression in Linux?

We have different commands for both which we can combine too.. but what exactly are they?

KawaiKx
  • 9,558
  • 19
  • 72
  • 111

3 Answers3

18

Archiving means that you take 10 files and combine them into one file, with no difference in size. If you start with 10 100KB files and archive them, the resulting single file is 1000KB. On the other hand, if you compress those 10 files, you might find that the resulting files range from only a few kilobytes to close to the original size of 100KB, depending upon the original file type. (source)

Djamel F.
  • 306
  • 2
  • 6
2

Compression is a process of taking some input data, and by using some sophisticated algorithm, compressing it (transform the bits, effectively), in order to have the same entity that weighs less size.

This is useful if you want to keep more data in a less space (space is always limited resource), or if you just want to have a faster file-transfer throughout networks.

Popular compression utility programs, on Linux distributions, are:

  • gzip (frequently used);

  • bzip2 (less frequently used, yet produces smaller output file than gzip);

  • xz (most space-efficient tool, in Linux, so far)

  • zip (often used for decompressing data, that was compressed on other systems using zip, like Windows OS).

    Note, that generally, more efficient compression method is, more time it takes.

Archiving, on the other hand, can be thought of like putting some different files into one box. If you have 5 files, each of a size of 10kb, archiving those will give you 5 x 10 = 50kb, and that is it.

Note, that on Linux, we have a very good program tar, which, when given an input, does both:

  1. archives the input (first step);
  2. and then compresses that archive.
Giorgi Tsiklauri
  • 9,715
  • 8
  • 45
  • 66
  • 1
    Your answer mixes up some things. First off, zip is both a compressor _and_ and archiver, wherease gzip, bzip2, and xz are all _just_ compressors. Second, tar is just an archiver, which has options to call utilities _external_ to tar to also compress the output. The tar format is only an archiving format, and originally was piped to a the external compressor. This was such a common thing to do, the piping to the external compressor was later built into tar. Your answer could be better organized by clearly differentiating archiving from compressing, and then showing how they are combined. – Mark Adler Jul 15 '21 at 18:57
  • The difference in approach between zip and tar.[compressed format] could be highlighted, where zip compresses and then archives, whereas tar.gz archives and then compresses. Each has it's own advantages, zip with random access and tar.gz with better compression. – Mark Adler Jul 15 '21 at 18:58
  • @MarkAdler unfortunately I don't have, right at this moment, any distribution in my hands.. but I'm quite certain and positive, that each of whatever I wrote up here, was very thoroughly checked and validated into the man pages and info pages of those utility programs. Maybe your knowledge is based on some other distributions's binaries? I don't know, but - again, CentOS 7.x was examined with its respective applications, and this was done during going through the Linux Certified Course.. and I remember well, that I've checked all this thoroughly. – Giorgi Tsiklauri Jul 16 '21 at 07:56
  • But there definitely might be missing some peculiarity of any of those programs.. however, whatever is written, is not wrong. I'll check it a bit later whether any addendum can be added. – Giorgi Tsiklauri Jul 16 '21 at 07:58
0

Archive:

  • An archive file is a collection of files and directories stored in one file.
  • The archive file is not compressed — it uses the same amount of disk space as all the individual files and directories combined.

Compress:

  • A compressed file is a collection of files and directories that are stored in one file and stored in a way that uses less disk space than all the individual files and directories combined.
  • If disk space is a concern, compress rarely-used files, or place all such files in a single archive file and compress it.

Source Url