104

I am trying to make a script for unzipping all the .tar.gz files from folders in one directory. For example, I will have a file which it calls ( testing.tar.gz). Then if I do manually, I can press to "extract here" then the .tar.gz file will create a new file, and it calls testing.tar. Finally, if I repeat the process of pressing "extract here", the .tar file prodcudes me all the .pdf files.

I wonder that how can I do it, and I have my code here and it seems doesn't realty work tho.

import os
import tarfile
import zipfile

def extract_file(path, to_directory='.'):
    if path.endswith('.zip'):
        opener, mode = zipfile.ZipFile, 'r'
    elif path.endswith('.tar.gz') or path.endswith('.tgz'):
        opener, mode = tarfile.open, 'r:gz'
    elif path.endswith('.tar.bz2') or path.endswith('.tbz'):
        opener, mode = tarfile.open, 'r:bz2'
    else: 
        raise ValueError, "Could not extract `%s` as no appropriate extractor is found" % path

    cwd = os.getcwd()
    os.chdir(to_directory)

    try:
        file = opener(path, mode)
        try: file.extractall()
        finally: file.close()
    finally:
        os.chdir(cwd)
Delimitry
  • 2,987
  • 4
  • 30
  • 39
Alex
  • 1,097
  • 2
  • 9
  • 12

7 Answers7

166

Why do you want to "press" twice to extract a .tar.gz, when you can easily do it once? Here is a simple code to extract both .tar and .tar.gz in one go:

import tarfile

if fname.endswith("tar.gz"):
    tar = tarfile.open(fname, "r:gz")
    tar.extractall()
    tar.close()
elif fname.endswith("tar"):
    tar = tarfile.open(fname, "r:")
    tar.extractall()
    tar.close()
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Lye Heng Foo
  • 1,779
  • 1
  • 10
  • 8
  • 1
    It is because, look like the file I have is (.tar.gz). But the unzipping process, it has to be extract from (.tar.gz) to (.gz) then, extract once more will give out the information I need like .pdf file etc – Alex Jun 17 '15 at 10:04
  • and your code isn't working : if (fname.endswith("tar.gz")): NameError: name 'fname' is not defined – Alex Jun 17 '15 at 10:04
  • 9
    @Alex `fname` would be a string that is your filename. – David Starkey May 16 '16 at 16:29
  • 1
    @Alex fname is the string of the filename that you are trying to un-tar. `files = [f for f in os.listdir('.') if os.path.isfile(f)] for fname in files: # do something, e.g. the above "if-elif" code.` – Lye Heng Foo Aug 17 '16 at 03:25
  • Sorry, looks like the inline code does not show up as multiple lines of code, but all the lines are merged into a single line. Hope you can get the idea, if not, please drop a comment, I will explain further. – Lye Heng Foo Aug 17 '16 at 03:37
  • How do you extract to another location? – Matthew Sep 21 '17 at 19:15
  • 6
    @Matthew You can use the path parameter in the extractall() command e.g. `tar.extractall(path="/new/dir/location")`. You can have more control too, e.g. if you need to extract only a few files inside the tar file using extract(). For more control, please take a look at the man page. https://docs.python.org/3/library/tarfile.html – Lye Heng Foo Sep 25 '17 at 04:47
  • The specific link to extract() command https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extract – Lye Heng Foo Sep 25 '17 at 05:03
  • does this method `extractall` require root permission? i'm running as non-root and got `PermissionError: [Errno 1] Operation not permitted:` error. – Lei Yang Jul 03 '20 at 04:01
  • @LeiYang No, you don't need root permission. Check that your directory is writable. – Lye Heng Foo Oct 12 '20 at 04:06
59

If you are using python 3, you should use shutil.unpack_archive that works for most of the common archive format.

shutil.unpack_archive(filename[, extract_dir[, format]])

Unpack an archive. filename is the full path of the archive. extract_dir is the name of the target directory where the archive is unpacked. If not provided, the current working directory is used.

For example:

def extract_all(archives, extract_path):
    for filename in archives:
        shutil.unpack_archive(filename, extract_path)
mickours
  • 1,113
  • 12
  • 13
  • 4
    Is there anyway to control the name of the extracted file. – Suraj Jun 11 '20 at 16:43
  • 3
    when the user has no root permission, `tarfile` cannot run, but `shutil` can. – Lei Yang Jul 03 '20 at 06:51
  • 1
    Finding the one line of python code that does what I need with minimum fuss sparks joy - thanks! I predict python will be the last programming language. – Mike Honey Aug 01 '20 at 00:22
  • @suraj-subramanian, the extract path will contain the new name. For example, if filename was "hello.tar.gz", extract_path might be "/tmp/my_name_here" – Justin Furuness Jun 02 '21 at 22:59
7

Using context manager:

import tarfile
<another code>
with tarfile.open(os.path.join(os.environ['BACKUP_DIR'],
                  f'Backup_{self.batch_id}.tar.gz'), "r:gz") as so:
    so.extractall(path=os.environ['BACKUP_DIR'])
Taras Vaskiv
  • 2,215
  • 1
  • 18
  • 17
4

If you are using python in jupyter-notebook and in a linux machine, the below will do:

!tar -xvzf /path/to/file.tar.gz -C /path/to/save_directory

! enables the command to be run in the terminal.

mcgusty
  • 1,354
  • 15
  • 21
1

The following worked for me for a .tar.gz file. It will extract files in your specified destination:

import tarfile

from os import mkdir
from os.path import isdir

src_path = 'path/to/my/source_file.tar.gz'
dst_path = 'path/to/my/destination'

# create destination dir if it does not exist
if isdir(dst_path) == False:
    mkdir(dst_path)

if src_path.endswith('tar.gz'):
    tar = tarfile.open(src_path, 'r:gz')
    tar.extractall(dst_path)
    tar.close()
Hafizur Rahman
  • 2,314
  • 21
  • 29
0

You can execute a shell script from Python using envoy:

import envoy # pip install envoy

if (file.endswith("tar.gz")):
    envoy.run("tar xzf %s -C %s" % (file, to_directory))

elif (file.endswith("tar")):
    envoy.run("tar xf %s -C %s" % (file, to_directory))
Ehsan
  • 67
  • 1
  • 7
-3

When I ran your program, it worked perfectly for a tar.gz and a .tgz file, it didn't give me the correct items when I opened the zip, but .tbz was the only one that raised an error. I think you used the wrong method to unpack a .tbz because the error said I had an incorrect file type, but I didn't. One way you could solve the .zip issue is to us os.command() and unzip it with a command line (depending on your os) because it returned a _MACOSX folder with nothing inside of it even though I entered the path correctly. The only other error I encountered was that you used improper syntax for raising an error.
This is what you should have used:

raise ValueError("Error message here")

You used a comma and no parenthesis. Hope this helps!

Beckett O'Brien
  • 176
  • 2
  • 16