0

I am using Python 3 with the tarfile module to compress some folders (with subfolders). What I need to do: to set a couple of subfolders to be excluded from the final tar file.

For example, say my folders looked like:

dir/
├── subdirA
│   ├── subsubdirA1
│   │   └── fileA11.txt
│   │   └── fileA12.txt
│   ├── subsubdirA2
│   │   └── fileA21.txt
│   │   └── fileA22.txt
│   └── fileA.txt
├── subdirB
│   ├── subsubdirB1
│   │   └── fileB11.txt
│   │   └── fileA12.txt
│   ├── subsubdirB2
│   │   └── fileB21.txt
│   │   └── fileB22.txt
│   └── fileB.txt
└── main.txt

Now, I say I wanted to include everything in dir/ except the contents of subsubdirA2 and of subsubdirB2. Based on this answer, I have tried:

EXCLUDE_FILES = ['/subdirA/subsubdirA2', '/subdirB/subsubdirB2']
mytarfile.add(..., filter=lambda x: None if x.name in EXCLUDE_FILES else x)

Or:

EXCLUDE_FILES = ['/subdirA/subsubdirA2/*', '/subdirB/subsubdirB2/*']
mytarfile.add(..., filter=lambda x: None if x.name in EXCLUDE_FILES else x)

Or:

EXCLUDE_FILES = ['/subdirA/subsubdirA2/*.*', '/subdirB/subsubdirB2/*.*']
mytarfile.add(..., filter=lambda x: None if x.name in EXCLUDE_FILES else x)

I also tried variants of the three options above where the subfolder paths started without / or with dir or with /dir. None worked - all the time, everything within dir was included.

How could I correctly exclude specific subfolders from a tar file I want to generate? If a different module/library is required instead of tarfile, that is fine.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241

2 Answers2

0

I didn't find reference about tarfile the way you need, but you can use thread and include shell command like this:

import subprocess

exclude=['dir/subdirA/subsubdirA2','dir/subdirA/subsubdirA1','dir/subdirA/text.tx']
excludeline=''
for x in exclude:
    excludeline += ' --exclude '+x
# cmd has tar command
cmd='tar -czvf dir.tar dir  '+ excludeline
print(cmd)
process = subprocess.Popen(cmd,shell=True,stdin=None,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
result=process.stdout.readlines()
# All files were compressed
if len(result) >= 1:
    for line in result:
        print(line.decode("utf-8"))

Where cmd has value in this example :

cmd = tar -czvf dir.tar dir   --exclude dir/subdirA/subsubdirA2 --exclude dir/subdirA/subsubdirA1 --exclude dir/subdirA/text.tx
GiovaniSalazar
  • 1,999
  • 2
  • 8
  • 15
0

I think the EXCLUDE_FILES that you are using should be matched against the file names with pattern matching. Here is how I would do that:

import re, os    
EXCLUDE_FILES = ['/subdirA/subsubdirA2/*', '/subdirB/subsubdirB2/*']
pattern = '(?:% s)' % '|'.join(EXCLUDE_FILES) #form a pattern string

For using a filter against the pattern we'll use re.match,

mytarfile.add(..., filter=lambda x: None if re.match(pattern, x.name) else x)

We exclude the file if file.name matches any of the patterns specified in EXCLUDE_FILES. Hope this helps.

S.Au.Ra.B.H
  • 457
  • 5
  • 9