4

I download a bz2 file using Python. Then I want to unpack the archive using:

def unpack_file(dir, file):
    cwd = os.getcwd()
    os.chdir(dir)
    print "Unpacking file %s" % file
    cmd = "tar -jxf %s" % file
    print cmd
    os.system(cmd)
    os.chdir(cwd)

Unfortunately this ends with error:

bzip2: Compressed file ends unexpectedly;
    perhaps it is corrupted?  *Possible* reason follows.
bzip2: Inappropriate ioctl for device
    Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Nieoczekiwany EOF w archiwum
tar: Nieoczekiwany EOF w archiwum
tar: Error is not recoverable: exiting now

However I can unpack the archive from the shell without any problem.

Do you have any ideas what I do wrong?

Szymon Lipiński
  • 27,098
  • 17
  • 75
  • 77
  • 1
    Could you show us the exact command you run in the shell, and the exact command (including the filename) that you pass to `os.system()`? – NPE Jan 17 '12 at 10:57
  • Please use [`subprocess.Popen`](http://docs.python.org/library/subprocess.html#replacing-os-system) instead of `os.system`. – jcollado Jan 17 '12 at 11:14
  • How are you downloading the file? If you put in a sleep(15) before calling unpack, does that still have the same error? – Foon Jan 17 '12 at 20:29

2 Answers2

17

For the record, python standard library ships with the tarfile module which automatically handles tar, tar.bz2, and tar.gz formats.

Additionally, you can do nifty things like get file lists, extract subsets of files or directories or chunk the archive so that you process it in a streaming form (i.e. you don't have to decompress the whole file then untar it.. it does everything in small chunks)

import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall()
tar.close()
synthesizerpatel
  • 27,321
  • 5
  • 74
  • 91
0

I would do it like this:

import tarfile
target_folder = '.'
with tarfile.open("sample.tar.gz") as tar:
    tar.extractall(target_folder)

That's it. tar / with takes care of the rest.

When you want to have the path to all the files:

import os
filepaths = []
for (dirpath, dirnames, filenames) in walk(target_folder):
    filepaths.extend([os.path.join(dirpath, f) for f in filenames])
Community
  • 1
  • 1
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958