12

I'm attempting to use Python's tarfile module to extract a tar.gz archive.

I'd like the extraction to overwrite any target files it they already exist - this is tarfile's normal behaviour.

However, I'm hitting a snitch in that some of the files have write-protection on (e.g. chmod 550).

The tarfile.extractall() operation actually fails:

IOError: [Errno 13] Permission denied '/foo/bar/file'

If I try to delete the files from the normal command-line, I can do it, I just need to answer a prompt:

$ rm <filename>
rm: <filename>: override protection 550 (yes/no)? yes

The normal GNU tar utility also handles these files effortlessly - it just overwrites them when you extract.

My user is the owner of the files, so it wouldn't be hard to recursively chmod the target files before running tarfile.extractall. Or I can use shutil.rmtree to blow away the target beforehand, which is the workaround I'm using now.. However, that feels a little hackish.

Is there a more Pythonic way of handle overwriting read-only files within tarfile, using exceptions, or something similar?

jscs
  • 63,694
  • 13
  • 151
  • 195
victorhooi
  • 16,775
  • 22
  • 90
  • 113

2 Answers2

11

You could loop over the members of the tarball and extract / handle errors on each file:

In a modern version of Python I'd use the with statement:

import os, tarfile

with tarfile.TarFile('myfile.tar', 'r', errorlevel=1) as tar:
    for file_ in tar:
        try:
            tar.extract(file_)
        except IOError as e:
            os.remove(file_.name)
            tar.extract(file_)
        finally:
            os.chmod(file_.name, file_.mode)

If you can't use with just replace the with statement block with:

tarball = tarfile.open('myfile.tar', 'r', errorlevel=1)
for file_ in tar:

If your tar ball is gzipped there's a quick shortcut to handle that with just:

tarfile.open('myfile.tar.gz', 'r:gz')

It would be nicer if tarfile.extractall had an overwrite option.

stderr
  • 8,567
  • 1
  • 34
  • 50
  • Awesome - that worked great =). Much cleaner than just mindlessly blowing away the directory. Small clarification - you're using "with", which I wasn't. I should probably switch to that - however, where should I insert "except ReadError" for the overall tarfile. Nested except's are bad practice, from what I understand? – victorhooi Aug 30 '11 at 01:16
  • 1
    The `with` statement will handle a `ReadError` exception raised while opening the archive. On error it will also automatically close the file. If you want more specific error handling you may want to explicitly open the file in a early `try/except` or you may want to write your own context manager that handles things differently. – stderr Sep 19 '14 at 18:52
  • So I was getting an "access denied" error when extracting `.sh` file from .tar.gz, and it was not a problem of overwriting already existing file - the dest folder was empty. I think it was due to "executable" attribute? Somehow, replacing `tarfile.open("1.tar.gz", "r")` with `tarfile.open("1.tar.gz", "r:gz")` solved it. Why?! According to docs, "r" is the same as "r:*", which for .gz archive is "r:gz". – Violet Giraffe Jul 16 '15 at 09:19
  • @stderr why the `os.chmod` in the `finally` block at the end? Won't the extracted file keep its permissions without having to set them again? – jpyams Oct 11 '17 at 19:09
  • Good answer -- but don't you mean "tar" instead of "tarball" (in the don't have "with" example). – PeterS6g Nov 28 '19 at 23:12
3

I was able to get Mike's Steder's code to work like this:

tarball = tarfile.open(filename, 'r:gz')
for f in tarball:
    try: 
        tarball.extract(f)
    except IOError as e:
        os.remove(f.name)
        tarball.extract(f)
    finally:
        os.chmod(f.name, f.mode)
Tim Santeford
  • 27,385
  • 16
  • 74
  • 101