1

I am not able to extract a generated tar.gz file, since extractall() complains that the target directory exists. However, if the extraction directory does not exist, it simply generates an empty file.

All the examples I've found online about extracting a tarfile either use no parameters for tarfile.extractall() (which means it attempts to extract it in the same directory and fails for me with IsADirectoryError) or make sure to create the extraction path beforehand.

This is using Python 3.5.2.

Reproduction script:

#!/usr/bin/python3

import os, tarfile, tempfile

# Create a test directory
test_dir = os.path.join(os.path.expanduser('~'), 'tarfile-test')
os.makedirs(test_dir, exist_ok=True)
os.chdir(test_dir)

# Create empty files to include in the tarfile
open('1.txt', 'a').close()
open('2.txt', 'a').close()
open('3.txt', 'a').close()

# Create the tarfile
compressed_file = 'packet.tgz'
with tarfile.open(compressed_file, 'w:gz') as tar:
    for f in os.listdir():
        tar.add(f, arcname=os.path.sep)

# Now attempt to extract it in three different places: a local directory, a
# temporary directory and a non-existent directory

# Local directory
local_dir = 'local-extraction'
os.makedirs(local_dir, exist_ok=True)
try:
    with tarfile.open(compressed_file, 'r:gz') as tar:
        tar.extractall(path=local_dir)
        print('Extracted in local dir!')
except IsADirectoryError:
    print('Failed to extract in local directory')

# Temporary directory
try:
    with tempfile.TemporaryDirectory() as tmp_dir:
        with tarfile.open(compressed_file, 'r:gz') as tar:
            tar.extractall(path=tmp_dir)
            print('Extracted in temporary dir!')
except IsADirectoryError:
    print('Failed to extract in temporary directory')

# Non-existent directory. This does not throw an exception, but fails to extract
# the files
non_existent = 'non_existent_dir'
with tarfile.open(compressed_file, 'r:gz') as tar:
    tar.extractall(path=non_existent)
    if os.path.isdir(non_existent):
        print('Extracted in previously non-existent dir!')
    else:
        print('Not extracted in non-existent dir')

Output:

$ ./repro.py 
Failed to extract in local directory
Failed to extract in temporary directory
Not extracted in non-existent dir

If we examine the contents of tarfile-test:

$ ll
total 16
drwxrwxr-x  3 user user 4096 Jul 11 08:38 ./
drwxr-xr-x 31 user user 4096 Jul 11 08:38 ../
-rw-rw-r--  1 user user    0 Jul 11 08:38 1.txt
-rw-rw-r--  1 user user    0 Jul 11 08:38 2.txt
-rw-rw-r--  1 user user    0 Jul 11 08:38 3.txt
drwxrwxr-x  2 user user 4096 Jul 11 08:38 local-extraction/
-rw-rw-r--  1 user user    0 Jul 11 08:38 non_existent_dir
-rw-rw-r--  1 user user  124 Jul 11 08:38 packet.tgz

non_existent_dir is an empty file, not a directory. local-extraction is empty.

What am I missing?

user2891462
  • 3,033
  • 2
  • 32
  • 60

1 Answers1

2

It looks like the problem is in the arcname parameter when creating the tar.gz file. I was (wrongly) following the advice in this comment. However, that should only be done when packing a directory, it corrupts the tar.gz file is used when adding individual files.

Changing/removing the arcname parameter in tarfile.add() fixes it:

# Create the tarfile
compressed_file = 'packet.tgz'
with tarfile.open(compressed_file, 'w:gz') as tar:
    for f in os.listdir():
        tar.add(f)
user2891462
  • 3,033
  • 2
  • 32
  • 60