Python: Extract using tarfile but ignoring directories

Question

If I have a .tar file with a file '/path/to/file.txt', is there a way (in Python) to extract the file to a specified directory without recreating the directory '/path/to'?

score 30 · Answer 1 · answered May 09 '13 at 01:36

I meet this problem as well, and list the complete example based on ekhumoro's answer

import os, tarfile
output_dir = "."
tar = tarfile.open(tar_file)
for member in tar.getmembers():
  if member.isreg():  # skip if the TarInfo is not files
    member.name = os.path.basename(member.name) # remove the path by reset it
    tar.extract(member,output_dir) # extract

score 16 · Answer 2 · answered Dec 06 '11 at 22:56

16

The data attributes of a TarInfo object are writable. So just change the name to whatever you want and then extract it:

import sys, os, tarfile

args = sys.argv[1:]
tar = tarfile.open(args[0])
member = tar.getmember(args[1])
member.name = os.path.basename(member.name)
path = args[2] if len(args) > 2 else ''
tar.extract(member, path)

answered Dec 06 '11 at 22:56

ekhumoro

115,249
20
229
336

This technique also works with ZipFile / ZipInfo: https://stackoverflow.com/a/47632134/482828 – Ed Randall Jul 06 '20 at 18:40

score 2 · Answer 3 · answered Dec 06 '11 at 20:02

2

As per the tarfile module, you can do that easily. I haven't checked it out yet.

TarFile.extract(member, path="")

Documentation:

Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. member may be a filename or a TarInfo object. You can specify a different directory using path.

So you should be able to do

TarFile.extract(member, path=".")

See the full documentation at : http://docs.python.org/library/tarfile.html

answered Dec 06 '11 at 20:02

pyfunc

65,343
15
148
136

1

When the docs say "to the current working directory, using its full name", the "full name" is actually a path. They might more accurately say "using its full path, starting from the current working directory... You can specify a different starting directory using path." So this answer won't work. ekhumoro's answer seems better. – Weeble Nov 08 '12 at 11:38

Marco smdm · Answer 4 · 2022-01-25T18:24:04.560

0

In case you want only certain kind of files (like .xml or .html), you can check the item.name.endswith('xml'). Just to match the previous examples:

import os, tarfile
tarfilename = <your_tar_file>
exitfolder = "." #your path

tar = tarfile.open(tar_file, 'r:*') # open a .tar.gz file i.e.
for item in tar:
  if item.name.endswith('xml'):  # getting only xml extensions
    item.name = os.path.basename(item.name) # remove the path
    tar.extract(item,exitfolder) # extract

edited Jan 25 '22 at 18:24

answered Aug 07 '19 at 11:55

Marco smdm

1,020
1
15
25

1

use `tarfile.open(tar_file, 'r:*')` to accept all the compression formats – fabrizioM Jun 04 '20 at 06:03

score 0 · Answer 5 · answered Dec 06 '11 at 20:01

0

You could use TarFile.extractfile(member) to extract a specific file.

It returns a filelike object (typical Python) which you can then use to write the contents to a file on any location you want.

answered Dec 06 '11 at 20:01

ChristopheD

112,638
29
165
179

That would work, but it doesn't preserve the file metadata (modification time etc.). – Adam Rosenfield Dec 06 '11 at 20:25

Python: Extract using tarfile but ignoring directories

5 Answers5

Linked