I'm trying to set up a code to pack a few big files (from tens to hundreds of gigabytes) into one archive. The compression methods that supported in tarfile module are a bit slow for such a big amount of data, so I would like to use some external compress module like lz4 to achive better speed of compression. Unfortunately I can't find a way how to create tar file and compress it with lz4 on the fly to avoid creating temporary tar file. The documentation of tarfile module says that there's a way to open an uncompressed stream for writing using 'w|' mode. Is it the way to stream tar file directly to lz4 module? If so, what's the proper way to use it? Thank you very much.
Asked
Active
Viewed 2,260 times
3
-
possible duplicate of [How to create full compressed tar file using Python?](http://stackoverflow.com/questions/2032403/how-to-create-full-compressed-tar-file-using-python) – Aditya Jun 15 '15 at 06:21
-
Unfortunately, no. That question covers standard methods of compression available in tarfile module itself. I'm trying to understand how to compress tar file on the fly with some method that is not available in tarfile module. I've edited the title of my question to make it a bit more clear. Thanks. – Trevor_Numbers Jun 15 '15 at 06:32
-
Ok in that case it's the real problem. Question has been unflagged... – Aditya Jun 15 '15 at 06:39
-
hmm but GNU tar only recognizes gz and bz2. I understand lz4 is better in terms of speed, but you are creating non compatible archive. – Kenji Noguchi Jun 15 '15 at 06:47
-
@KenjiNoguchi , not sure if I understood what you mean. As far as I know in unix-like systems tar traditionally have been used just as a container that keeps files together in one file - no matter if you use built in compression or not. If you packed an uncompressed tar into lz4, you can always un-lz4 it via lz4, and then just un-tar the resulting file via tar. Thanks! – Trevor_Numbers Jun 15 '15 at 07:10
-
ok, gotcha. I haven't tried myself but I think you can open a file stream to lz4 command, and pass the file object to `tarfile.open` – Kenji Noguchi Jun 15 '15 at 07:34
-
@KenjiNoguchi, thank you very much! I'll read about this and will try to implement it. – Trevor_Numbers Jun 15 '15 at 07:47
-
I was looking to integrate `tarfile` and `python-lz4` but the lz4 module does not support streaming. So it's not possible. – Kenji Noguchi Jun 15 '15 at 08:14
-
@KenjiNoguchi I've looked into lz4 module and indeed it only understands string data on input. Thanks for pointing that out. In this case maybe there's no reason to use tarfile module at all, and it would be easier just to call system commands. Thank you very much again, you really help to understand how to solve this task in a simpler way. – Trevor_Numbers Jun 15 '15 at 10:35
2 Answers
5
Per our conversation above.
import tarfile
import subprocess
p = subprocess.Popen(['lz4', '-'], stdin=subprocess.PIPE)
tar = tarfile.open(fileobj=p.stdin, mode="w|")
From there you can do the usual tar.addfile
. FYI: as I stated in the conversation. GNU tar can auto detect gz and bz2 but not lz4. Just a note. So you have to do lz4 -c -d stdin.lz4 | tar xf -
to extract files. If you simply did tar xf
it would fail.

Kenji Noguchi
- 1,752
- 2
- 17
- 26
-
I can't vote yet unfortunately, but your help is more than appreciated. Thank you very much. – Trevor_Numbers Jun 15 '15 at 07:59
1
You can pipe the result of the tar
command directly to the lz4
utility. This will avoid usage of any intermediate file. Here is an example (assuming you have both tar
and lz4
installed on your system) :
tar cvf - * | lz4 > mypack.tar.lz4
The -
here tells to output the result from tar
to stdout
. Of course, you can change the *
with whichever target you want to tar.
The reverse operation is also possible :
lz4 -d mypack.tar.lz4 | tar xv

Cyan
- 13,248
- 8
- 43
- 78