5

I want to compress a directory in Linux. I created a tar.gz that it turns to be a big file, due to the reason that the directory contains some *.o files and some pdf files.

Is there any way to compress a directory but exclude files larger than a predefined SIZE? There is a --exclude argument in tar command, however I would like to reject files larger than 1 MB. This is the constrain, not the name of the file.

cateof
  • 6,608
  • 25
  • 79
  • 153

4 Answers4

4

Based on Jan-Philip Gehrcke's response:

find . -type f -size -1024k -print0 | tar -czf --null -T - -f archive.tar.gz

for files less than 1M. Tested on OS X and Ubuntu Linux.

joanis
  • 10,635
  • 14
  • 30
  • 40
fredbaba
  • 1,466
  • 1
  • 15
  • 26
  • 2
    On debian squeeze I had this error : `tar: Multiple archive files require '-M' option` But it worked with something like : `find . -type f -size -100k | tar -cz -f test.tgz -T -` – Fluxine Dec 20 '13 at 11:59
  • 1
    @Fluxine Couldn't get that to work either. Had to look at http://stackoverflow.com/q/5891866/63736 and ended up with `find . -type f -size 1M -print0 | tar -vzcf backup.tar.gz --null -T -` – Bruce van der Kooij Nov 23 '14 at 09:24
  • The `...| tar c --null -T - ` solution works nicely on arbitrarily long file lists, but has a minor drawback: it stores the whole file list in the memory. If you have lots of small files, then that may be a problem. – P.Péter Feb 02 '15 at 10:15
  • I am getting a long print-out of files like `./file1\n./file1./file3\n` and an error `Cannot stat: File name too long tar: Exiting with failure status due to previous errors` on Ubuntu – Dima Lituiev May 20 '15 at 15:20
  • It can be fixed by including `-print0`, as described in [this discussion](https://superuser.com/questions/148020/using-find-and-tar-with-files-with-special-characters-in-the-name/148021#148021) – Dima Lituiev May 20 '15 at 15:27
1

The ...| tar c --null -T - solution above is the best if you have adequate memory (i.e. the file list fits into your memory easily (in most cases, this is true)). However, xargs does have a place if you are memory-constrained, but you have to use it appropriately so that the multiple tar invocations have no ill effect.

To compress, you may use:

find . -type f -size -1024k | xargs tar c | gzip > archive.tar.gz

This results in a file of concatenated tar archives, gzipped together into the resulting file (you may also use cz and omit | gzip as concatenating gzip archives is still valid gzip, but you lose a tiny bit of compression, or quite a bit of compression if you use bzip2 or xz instead of gzip).

To extract the resulting file you have to use the --ignore-zeros or -i option of tar to not only extract the first archive:

tar xizf archive.tar.gz
P.Péter
  • 1,527
  • 16
  • 39
-1

You could use a combination of find (with its -size flag) and xargs to pass it into tar.

Something like:

find . -size -100k -print | xargs tar cvf archive.tar

for files less than 100k. See man find for the other size options

matt freake
  • 4,877
  • 4
  • 27
  • 56
-3

find ./myRep/ -type f -size -1024k | xargs tar cfvz myArchive.tar

In a word, first part of this expression construct a list of files that size is lower than 1024k recursively from ./myRep/ and second part create tar/gzip archive.

Nic_tfm
  • 440
  • 2
  • 7
  • 5
    This might invoke tar repeatedly. Have a look at `find . -print0 | tar --null -T - ...` (from tar manual: "If you give a single dash as a file name for ‘--files-from’, (i.e., you specify either --files-from=- or -T -), then the file names are read from standard input. ") – Dr. Jan-Philip Gehrcke Dec 14 '12 at 15:10