5

I have a FolderA which contains FolderB and FileB. How can I create a tar.gz archive which ONLY contains FolderB and FileB, removing the parent directory FolderA? I'm using Python and I'm running this code on a Windows machine.

The best lead I found was: How to create full compressed tar file using Python?

In the most upvoted answer, people discuss ways to remove the parent directory, but none of them work for me. I've tried arcname, os.walk, and running the tar command via subprocess.call ().

I got close with os.walk, but in the code below, it still drops a " _ " directory in with FolderB and FileB. So, the file structure is ARCHIVE.tar.gz > ARCHIVE.tar > "_" directory, FolderB, FileB.

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        length = len(source_dir)
        for root, dirs, files in os.walk(source_dir):
            folder = root[length:]  # path without "parent"
            for file in files:
                tar.add(os.path.join(root, folder), folder)

I make the archive using:

make_tarfile('ARCHIVE.tar.gz', 'C:\FolderA')

Should I carry on using os.walk, or is there any other way to solve this?

Update

Here is an image showing the contents of my archive. As you can see, there is a " _ " folder in my archive that I want to get rid of--oddly enough, when I extract, only FolderA and FileB.html appear as archived. In essence, the behavior is correct, but if I could go the last step of removing the " _ " folder from the archive, that would be perfect. I'm going to ask an updated question to limit confusion.

halfer
  • 19,824
  • 17
  • 99
  • 186
Andy
  • 71
  • 1
  • 5
  • Why can't you just `tar.extractall(path=destination)`, where `tar` comes from `tarfile.open(FolderB_path)` – Kacperito Oct 16 '19 at 22:11
  • If I were the only consumer of this .tar.gz, I could certainly do this, but I'm creating a .tar.gz for others to use that needs to have a specific structure. – Andy Oct 17 '19 at 00:33

4 Answers4

6

This works for me:

with tarfile.open(output_filename, "w:gz") as tar:
    for fn in os.listdir(source_dir):
        p = os.path.join(source_dir, fn)
        tar.add(p, arcname=fn)

i.e. Just list the root of the source dir and add each entry to the archive. No need for walking the source dir as adding a directory via tar.add() is automatically recursive.

driedler
  • 3,750
  • 33
  • 26
1

You could use subprocess to achieve something similar and much faster.

def make_tarfile(output_filename, source_dir):
    subprocess.call(["tar", "-C", source_dir, "-zcvf", output_filename, "."])
Hardian Lawi
  • 588
  • 5
  • 22
0

I've tried to provide some examples of how changes to the source directory makes a difference to what finally gets extracted.

As per your example, I have this folder structure

enter image description here

I have this python to generate the tar file (lifted from here)

import tarfile
import os

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

What data and structure is included in the tar file depends on what location I provide as a parameter.

So this location parameter,

make_tarfile('folder.tar.gz','folder_A/' )

will generate this result when extracted

enter image description here

If I move into folder_A and reference folder_B,

make_tarfile('folder.tar.gz','folder_A/folder_B' )

This is what the extract will be,

enter image description here

Notice that folder_B is the root of this extract.

Now finally,

make_tarfile('folder.tar.gz','folder_A/folder_B/' )

Will extract to this

enter image description here

Just the file is included in the extract.

the_good_pony
  • 490
  • 5
  • 12
  • Thanks for responding. The key thing is (and sorry if I did a poor job conveying this), FolderB and FileB are in the same directory level, they're both directly in C:\FolderA. So your 1st example will extract FolderA and its contents; your 2nd example will extract FolderB and its contents; but I want FolderA's contents without FolderA appearing at all. I have tried with arcname=os.path.basename(source_dir) but FolderA always gets included. – Andy Oct 17 '19 at 00:31
0

Here is a function to perform the task. I have had some issues extracting the tar on Windows (with WinRar) as it seemed to try to extract the same file twice, but I think it will work fine when extracting the archive properly.

"""
The directory structure I have is as follows:

├───FolderA
│   │   FileB
│   │
│   └───FolderB
│           FileC
"""

import tarfile
import os

# This is where I stored FolderA on my computer
ROOT = os.path.join(os.path.dirname(__file__), "FolderA")


def make_tarfile(output_filename: str, source_dir: str) -> bool:
    """ 
    :return: True on success, False otherwise
    """

    # This is where the path to each file and folder will be saved
    paths_to_tar = set()

    # os.walk over the root folder ("FolderA") - note it will never get added
    for dirpath, dirnames, filenames in os.walk(source_dir):

        # Resolve path issues, for example for Windows
        dirpath = os.path.normpath(dirpath)

        # Add each folder and path in the current directory
        # Probably could use zip here instead of set unions but can't be bothered to try to figure it out
        paths_to_tar = paths_to_tar.union(
            {os.path.join(dirpath, d) for d in dirnames}).union(
            {os.path.join(dirpath, f) for f in filenames})

    try:
        # This will create the tar file in the current directory
        with tarfile.open(output_filename, "w:gz") as tar:

            # Change the directory to treat all paths relatively
            os.chdir(source_dir)

            # Finally add each path using the relative path
            for path in paths_to_tar:
                tar.add(os.path.relpath(path, source_dir))
            return True

    except (tarfile.TarError, OSError) as e:
        print(f"An error occurred - {e}")
        return False


if __name__ == '__main__':
    make_tarfile("tarred_files.tar.gz", ROOT)
Kacperito
  • 1,277
  • 1
  • 10
  • 27
  • Hi Kacper, thanks for the reply! I was able to use 7zip CLI to achieve what I needed, but I'd definitely like to find time to give your solution a try too. I posted my findings in a different question: https://stackoverflow.com/questions/58423574/python-created-tar-gz-file-contains-folder-how-to-remove/ – Andy Oct 18 '19 at 19:09