3

My goal is to extract a file out of a .tar.gz file without also extracting out the sub directories that precede the desired file. I am trying to module my method off this question. I already asked a question of my own but it seemed like the answer I thought would work didn't work fully.

In short, shutil.copyfileobj isn't copying the contents of my file.

My code is now:

import os
import shutil
import tarfile
import gzip

with tarfile.open('RTLog_20150425T152948.gz', 'r:*') as tar:
    for member in tar.getmembers():
        filename = os.path.basename(member.name)
        if not filename:
            continue

        source = tar.fileobj
        target = open('out', "wb")
        shutil.copyfileobj(source, target)

Upon running this code the file out was successfully created however, the file was empty. I know that this file I wanted to extract does, in fact, have lots of information (approximately 450 kb). A print(member.size) returns 1564197.

My attempts to solve this were unsuccessful. A print(type(tar.fileobj)) told me that tar.fileobj is a <gzip _io.BufferedReader name='RTLog_20150425T152948.gz' 0x3669710>.

Therefore I tried changing source to: source = gzip.open(tar.fileobj) but this raised the following error:

Traceback (most recent call last):
  File "C:\Users\dzhao\Desktop\123456\444444\blah.py", line 15, in <module>
    shutil.copyfileobj(source, target)
  File "C:\Python34\lib\shutil.py", line 67, in copyfileobj
    buf = fsrc.read(length)
  File "C:\Python34\lib\gzip.py", line 365, in read
    if not self._read(readsize):
  File "C:\Python34\lib\gzip.py", line 433, in _read
    if not self._read_gzip_header():
  File "C:\Python34\lib\gzip.py", line 297, in _read_gzip_header
    raise OSError('Not a gzipped file')
OSError: Not a gzipped file

Why isn't shutil.copyfileobj actually copying the contents of the file in the .tar.gz?

Community
  • 1
  • 1
Dzhao
  • 683
  • 1
  • 9
  • 22
  • You're missing the size option in `shutil.copyfileobj()`. You should be including `member.size`, else it'll read to the end of the tar file. I doubt it'll fix your actual problem though – Alastair McCormack Jun 10 '16 at 21:19
  • @AlastairMcCormack Oh I didn't realize it read till the end of the tarfile. In this case it isn't a big deal because the tarfile is only one file but jee thanks! – Dzhao Jun 10 '16 at 21:20

1 Answers1

3

fileobj isn't a documented property of TarFile. It's probably an internal object used to represent the whole tar file, not something specific to the current file.

Use TarFile.extractfile() to get a file-like object for a specific member:

…
source = tar.extractfile(member)
target = open("out", "wb")
shutil.copyfile(source, target)
  • `extractfile()` definitely extracts the file but I get the messy subdirs preceding the file. I want to copy only the file and not the subdirs that come with it. – Dzhao Jun 10 '16 at 21:03
  • Huh? Despite the (misleading!) name, `tar.extractfile()` doesn't extract anything on its own. It just returns a file-like object. –  Jun 10 '16 at 21:11
  • Ok sorry I mistook `extractfile()` to be `extract()` or `extractall()`. I tried your code and it works. While I THOUGHT I read the tarfile docs well enough I clearly did NOT. Thank you! – Dzhao Jun 10 '16 at 21:16