I am trying to extract a zipped folder but instead of directly using .extractall()
, I want to extract the file into stream so that I can handle the stream myself. Is it possible to do it using tarfile
? Or is there any suggestions?
Asked
Active
Viewed 2.2k times
13

Martijn Pieters
- 1,048,767
- 296
- 4,058
- 3,343

Robin W.
- 361
- 1
- 3
- 14
-
Do you mean `tarfile` library? – Chris Medrela Nov 26 '12 at 09:43
-
Yes,sorry for the typo error – Robin W. Nov 26 '12 at 10:01
2 Answers
23
You can obtain each file from a tar file as a python file
object using the .extractfile()
method. Loop over the tarfile.TarFile()
instance to list all entries:
import tarfile
with tarfile.open(path) as tf:
for entry in tf: # list each entry one by one
fileobj = tf.extractfile(entry)
# fileobj is now an open file object. Use `.read()` to get the data.
# alternatively, loop over `fileobj` to read it line by line.

Martijn Pieters
- 1,048,767
- 296
- 4,058
- 3,343
-
2And if the fileobj is a gzip file, would it be possible to decompress it? – Werner Sep 09 '15 at 12:53
-
1@Werner: the `tarfile` module takes care of compression for you. See the [`tarfile.open()` documentation](https://docs.python.org/2/library/tarfile.html#tarfile.open), the default mode is `r`, which transparently detects compression and handles decompression as needed. – Martijn Pieters Sep 09 '15 at 12:54
-
2Yes, but inside the tarfile I have a gzip file (unfortunately someone created a compressed tarfile with my gzip file…). The `extractfile` returns a `tarfile.ExFileObject` which cannot be used to open a gzip.GzipFile. Would there be a way to open this gzip file without decompressing the tarfile and open the new system file? – Werner Sep 09 '15 at 12:58
-
1@Werner: I take it you are using Python 2 then? Python 3's `gzip` module should take that object without issues, but the Python 2 version still tries to seek on the file object. Either upgrade to Python 3, or copy the file to disk first, or decode the stream as you read it, see [Python decompressing gzip chunk-by-chunk](http://stackoverflow.com/q/2423866) – Martijn Pieters Sep 09 '15 at 13:03
-
Yes, still on python 2, unfortunately, and it's not possible to upgrade as it makes part of the environment. Ok, thanks a lot! Couldn't find any information on this… – Werner Sep 09 '15 at 13:10
-
Evidently you need to be careful with directories; you'll get an entry for them, but when you call `extractfile()` on them, `None` will be returned. – weberc2 Dec 02 '15 at 14:49
-
@MartijnPieters A side question (not sure if it's worth putting as a standalone question), what needs to be done for cleanup of a file object obtained from the `extractfile` method? Is the file extracted on disk anywhere and does that need explicit deletion? (python3) – 0xc0de Jun 08 '20 at 09:57
-
@0xc0de: `extractfile` reads directly from the `TarFile` stream, no temp files are created on disk, no cleanup is needed. – Martijn Pieters Jun 09 '20 at 22:42
1
I was unable to extractfile
while network streaming a tar file, I did something like this instead:
from backports.lzma import LZMAFile
import tarfile
some_streamed_tar = LZMAFile(requests.get('http://some.com/some.tar.xz').content)
with tarfile.open(fileobj=some_streamed_tar) as tf:
tarfileobj.extractall(path="/tmp", members=None)
And to read them:
for fn in os.listdir("/tmp"):
with open(os.path.join(t, fn)) as f:
print(f.read())
python 2.7.13

jmunsch
- 22,771
- 11
- 93
- 114
-
You can also achieve this directly with streaming, i.e. without any temporary files: https://stackoverflow.com/a/34131505/19163 – vog Jun 01 '18 at 08:58