How to exclude big files while compressing a directory with tar

Question

I want to compress a directory in Linux. I created a tar.gz that it turns to be a big file, due to the reason that the directory contains some *.o files and some pdf files.

Is there any way to compress a directory but exclude files larger than a predefined SIZE? There is a --exclude argument in tar command, however I would like to reject files larger than 1 MB. This is the constrain, not the name of the file.

score 4 · Answer 1 · edited Jun 30 '20 at 19:34

4

Based on Jan-Philip Gehrcke's response:

find . -type f -size -1024k -print0 | tar -czf --null -T - -f archive.tar.gz

for files less than 1M. Tested on OS X and Ubuntu Linux.

edited Jun 30 '20 at 19:34

joanis

10,635
14
30
40

answered Aug 12 '13 at 16:06

fredbaba

1,466
1
15
26

2

On debian squeeze I had this error : `tar: Multiple archive files require '-M' option` But it worked with something like : `find . -type f -size -100k | tar -cz -f test.tgz -T -` – Fluxine Dec 20 '13 at 11:59
1

@Fluxine Couldn't get that to work either. Had to look at http://stackoverflow.com/q/5891866/63736 and ended up with `find . -type f -size 1M -print0 | tar -vzcf backup.tar.gz --null -T -` – Bruce van der Kooij Nov 23 '14 at 09:24
The `...| tar c --null -T - ` solution works nicely on arbitrarily long file lists, but has a minor drawback: it stores the whole file list in the memory. If you have lots of small files, then that may be a problem. – P.Péter Feb 02 '15 at 10:15
I am getting a long print-out of files like `./file1\n./file1./file3\n` and an error `Cannot stat: File name too long tar: Exiting with failure status due to previous errors` on Ubuntu – Dima Lituiev May 20 '15 at 15:20
It can be fixed by including `-print0`, as described in [this discussion](https://superuser.com/questions/148020/using-find-and-tar-with-files-with-special-characters-in-the-name/148021#148021) – Dima Lituiev May 20 '15 at 15:27

score 1 · Answer 2 · answered Feb 02 '15 at 17:59

The ...| tar c --null -T - solution above is the best if you have adequate memory (i.e. the file list fits into your memory easily (in most cases, this is true)). However, xargs does have a place if you are memory-constrained, but you have to use it appropriately so that the multiple tar invocations have no ill effect.

To compress, you may use:

find . -type f -size -1024k | xargs tar c | gzip > archive.tar.gz

This results in a file of concatenated tar archives, gzipped together into the resulting file (you may also use cz and omit | gzip as concatenating gzip archives is still valid gzip, but you lose a tiny bit of compression, or quite a bit of compression if you use bzip2 or xz instead of gzip).

To extract the resulting file you have to use the --ignore-zeros or -i option of tar to not only extract the first archive:

tar xizf archive.tar.gz

score -1 · Answer 3 · answered Mar 02 '12 at 10:01

-1

You could use a combination of find (with its -size flag) and xargs to pass it into tar.

Something like:

find . -size -100k -print | xargs tar cvf archive.tar

for files less than 100k. See man find for the other size options

answered Mar 02 '12 at 10:01

matt freake

4,877
4
27
56

Nic_tfm · Accepted Answer · 2012-03-02T10:04:19.137

-3

find ./myRep/ -type f -size -1024k | xargs tar cfvz myArchive.tar

In a word, first part of this expression construct a list of files that size is lower than 1024k recursively from ./myRep/ and second part create tar/gzip archive.

edited Mar 02 '12 at 10:04

answered Mar 02 '12 at 09:54

Nic_tfm

440
2
7

5

This might invoke tar repeatedly. Have a look at `find . -print0 | tar --null -T - ...` (from tar manual: "If you give a single dash as a file name for ‘--files-from’, (i.e., you specify either --files-from=- or -T -), then the file names are read from standard input. ") – Dr. Jan-Philip Gehrcke Dec 14 '12 at 15:10

How to exclude big files while compressing a directory with tar

4 Answers4