3

My goal is to unpack a .tar.gz file and not its sub-directories leading up to the file.

My code is based off this question except instead of unpacking a .zip I am unpacking a .tar.gz file.

I am asking this question because the error I'm getting is very vague and doesn't identify the problem in my code:

import os
import shutil
import tarfile

with tarfile.open('RTLog_20150425T152948.gz', 'r:gz') as tar:
    for member in tar.getmembers():
        filename = os.path.basename(member.name)
        if not filename:
            continue

        # copy file (taken from zipfile's extract)
        source = member
        target = open(os.path.join(os.getcwd(), filename), "wb")
        with source, target:
            shutil.copyfileobj(source, target)

As you can see I copied the code from the linked question and tried to change it to deal with .tar.gz members instead of .zip members. Upon running the code I get the following error:

Traceback (most recent call last):
  File "C:\Users\dzhao\Desktop\123456\444444\blah.py", line 27, in <module>
    with source, target:
AttributeError: __exit__

From the reading I've done, shutil.copyfileobj takes as input two "file-like" objects. member is a TarInfo object. I'm not sure if a TarInfo object is a file-like object so I tried changing this line from:

source = member #to
source = open(os.path.join(os.getcwd(), member.name), 'rb')

But this understandably raised an error where the file wasn't found.

What am I not understanding?

Community
  • 1
  • 1
Dzhao
  • 683
  • 1
  • 9
  • 22

1 Answers1

6

This code has worked for me:

import os
import shutil
import tarfile

with tarfile.open(fname, "r|*") as tar:
    counter = 0

    for member in tar:
        if member.isfile():
            filename = os.path.basename(member.name)
            if filename != "myfile": # do your check
                continue

            with open("output.file", "wb") as output: 
                shutil.copyfileobj(tar.fileobj, output, member.size)

            break # got our file

        counter += 1
        if counter % 1000 == 0:
            tar.members = [] # free ram... yes we have to do this manually

But your problem might not be the extraction, but rather that your file is indeed no .tar.gz but just a .gz file.

Edit: Also your getting the error on the with line because python is trying to call the __enter__ function of the member object (wich does not exist).

Community
  • 1
  • 1
Simon Kirsten
  • 2,542
  • 18
  • 21
  • I know that my file is for sure a .tar.gz. My initial fear was correct when I removed my `with source, target` line. It seems that my source wasn't a file-like object. I'll try your code after I read up on what `tar.fileobj` does. – Dzhao Jun 10 '16 at 17:13
  • The fix was to change source to `tar.fileobj`. While interestingly when I did a ctrl+f on the tar documents page it isn't a function. So it must be some variable. But the important thing is that `tar.fileobj` is a file-like object so now my code worked =) Thanks! – Dzhao Jun 10 '16 at 17:18
  • You're welcome. If you are dealing with large files (100s of MB) I highly recommend including the free ram lines. This is not mentioned in the documentation but will catch you by surprise if your scripts suddenly fail and ram usage is way to high. – Simon Kirsten Jun 10 '16 at 17:22