53

I'm looking for a way to include/exclude files patterns and exclude directories from a os.walk() call.

Here's what I'm doing by now:

import fnmatch
import os

includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']

def _filter(paths):
    for path in paths:
        if os.path.isdir(path) and not path in excludes:
            yield path

        for pattern in (includes + excludes):
            if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):
                yield path

for root, dirs, files in os.walk('/home/paulo-freitas'):
    dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
    files[:] = _filter(map(lambda f: os.path.join(root, f), files))

    for filename in files:
        filename = os.path.join(root, filename)

        print(filename)

Is there a better way to do this? How?

Paulo Freitas
  • 13,194
  • 14
  • 74
  • 96

8 Answers8

65

This solution uses fnmatch.translate to convert glob patterns to regular expressions (it assumes the includes only is used for files):

import fnmatch
import os
import os.path
import re

includes = ['*.doc', '*.odt'] # for files only
excludes = ['/home/paulo-freitas/Documents'] # for dirs and files

# transform glob patterns to regular expressions
includes = r'|'.join([fnmatch.translate(x) for x in includes])
excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'

for root, dirs, files in os.walk('/home/paulo-freitas'):

    # exclude dirs
    dirs[:] = [os.path.join(root, d) for d in dirs]
    dirs[:] = [d for d in dirs if not re.match(excludes, d)]

    # exclude/include files
    files = [os.path.join(root, f) for f in files]
    files = [f for f in files if not re.match(excludes, f)]
    files = [f for f in files if re.match(includes, f)]

    for fname in files:
        print fname
Oben Sonne
  • 9,893
  • 2
  • 40
  • 61
  • Ermm, we need `if excludes` checks in both `re.match(excludes, ...)`, no? If `excludes = []`, it'll match all entries. But I liked your approach, much more clear. :) – Paulo Freitas Feb 28 '11 at 14:03
  • 3
    @pf.me: You're right, I did not considered that case. So either you *1)* wrap the exclude list comprehension in an `if exclude`, *2)* prefix `not re.match(excludes, ...)` with `not exclude or`, or *3)* set `excludes` to a never matching regex if the original excludes is empty. I updated my answer using variant *3*. – Oben Sonne Feb 28 '11 at 14:29
  • 8
    After some googling, it would appear that the point of the [:] syntax `dirs[:] = [os.path.join(root, d) for d in dirs]` is to employ the mutating slice method, which alters the list in place, instead of creating a new list. This caught me out - without the [:], it doesn't work. – hajamie Oct 30 '12 at 17:31
  • I still do not get mechanics , how dirs[:] alter original list? All manuals says that slice[:] returns new fresh copy of the list, with members as pointers to the original list values.[Here is a discussion on Stack about this.](http://stackoverflow.com/questions/509211/the-python-slice-notation) So how does it happen that **dirs[:]** alter original list? – Danylo Gurianov May 14 '13 at 14:19
  • 3
    @Daniel: Slicing may not only be used to *get* values of a list but also to *assign* selected items. As `[:]` denotes the complete list, assigning to this slice replaces the whole previous content of the list. See http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types. – Oben Sonne May 17 '13 at 22:24
  • As stated below by @kojiro, I guess you need to provide `topdown=True` to os.walk, so that the `dirs` can be modified in place? – blueFast Oct 30 '13 at 10:22
  • @gonvaled `topdown=True` is the default. – Oben Sonne Nov 29 '13 at 21:27
24

From docs.python.org:

os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])

When topdown is True, the caller can modify the dirnames list in-place … this can be used to prune the search …

for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
    # excludes can be done with fnmatch.filter and complementary set,
    # but it's more annoying to read.
    dirs[:] = [d for d in dirs if d not in excludes] 
    for pat in includes:
        for f in fnmatch.filter(files, pat):
            print os.path.join(root, f)

I should point out that the above code assumes excludes is a pattern, not a full path. You would need to adjust the list comprehension to filter if os.path.join(root, d) not in excludes to match the OP case.

Community
  • 1
  • 1
kojiro
  • 74,557
  • 19
  • 143
  • 201
  • 3
    What do `excludes` and `includes` look like here? Is there an example to go with this answer? – user5359531 Jun 17 '16 at 19:02
  • A dumb question, if i say exclude on a directory,does it exclude _everything_ under that directory? or it will only skip that directory but will still navigate to its subdirecroy?sorry if this has been asked before. But, basically if I want to exclude the directory and everything under it what should i be looking for? – ApJo Jul 20 '23 at 21:15
12

why fnmatch?

import os
excludes=....
for ROOT,DIR,FILES in os.walk("/path"):
    for file in FILES:
       if file.endswith(('doc','odt')):
          print file
    for directory in DIR:
       if not directory in excludes :
          print directory

not exhaustively tested

kurumi
  • 25,121
  • 5
  • 44
  • 52
  • 2
    The endswith should be .doc and .odt instead. Because a file with name such as mydoc [with no file extension] will be returned in the above code. Also, I think this will meet just the specific case the OP has posted. The excludes may contain files too and inclides may contain dirs I guess. – aNish Feb 28 '11 at 11:50
  • You need `fnmatch` if you have to make use of glob patterns (though this is not the case in the example given in the question). – Oben Sonne Feb 28 '11 at 12:38
  • @Oben Sonne, glob (IMO) has more "functionality" than fnmatch. for eg, path name expansion. You could do this for example `glob.glob("/path/*/*/*.txt")`. – kurumi Feb 28 '11 at 12:56
  • Good point. For simple include/exclude patterns `glob.glob()` probably would be the better solution at all. – Oben Sonne Feb 28 '11 at 13:04
  • Out of good practices and simplifing debugging I try to not use variable names that match built-in types like your use of "file" as that is a built-in type. – DevPlayer Sep 10 '13 at 12:14
1

dirtools is perfect for your use-case:

from dirtools import Dir

print(Dir('.', exclude_file='.gitignore').files())
michaeljoseph
  • 7,023
  • 5
  • 26
  • 27
0
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def file_search(path, exe):
for x,y,z in os.walk(path):
    for a in z:
        if a[-4:] == exe:
            print os.path.join(x,a)
        for x in includes:
            file_search(excludes[0],x)
VMAtm
  • 27,943
  • 17
  • 79
  • 125
juniour
  • 1
  • 1
0

This is an example of excluding directories and files with os.walk():

ignoreDirPatterns=[".git"]
ignoreFilePatterns=[".php"]
def copyTree(src, dest, onerror=None):
    src = os.path.abspath(src)
    src_prefix = len(src) + len(os.path.sep)
    for root, dirs, files in os.walk(src, onerror=onerror):
        for pattern in ignoreDirPatterns:
            if pattern in root:
                break
        else:
            #If the above break didn't work, this part will be executed
            for file in files:
                for pattern in ignoreFilePatterns:
                    if pattern in file:
                        break
                else:
                    #If the above break didn't work, this part will be executed
                    dirpath = os.path.join(dest, root[src_prefix:])
                    try:
                        os.makedirs(dirpath,exist_ok=True)
                    except OSError as e:
                        if onerror is not None:
                            onerror(e)
                    filepath=os.path.join(root,file)
                    shutil.copy(filepath,dirpath)
                continue;#If the above else didn't executed, this will be reached

        continue;#If the above else didn't executed, this will be reached

python >=3.2 due to exist_ok in makedirs

Jahid
  • 21,542
  • 10
  • 90
  • 108
0

Here is one way to do that

import fnmatch
import os

excludes = ['/home/paulo-freitas/Documents']
matches = []
for path, dirs, files in os.walk(os.getcwd()):
    for eachpath in excludes:
        if eachpath in path:
            continue
    else:
        for result in [os.path.abspath(os.path.join(path, filename)) for
                filename in files if fnmatch.fnmatch(filename,'*.doc') or fnmatch.fnmatch(filename,'*.odt')]:
            matches.append(result)
print matches
Senthil Kumaran
  • 54,681
  • 14
  • 94
  • 131
0

The above methods had not worked for me.

So, This is what I came up with an expansion of my original answer to another question.

What worked for me was:

if (not (str(root) + '/').startswith(tuple(exclude_foldr)))

which compiled a path and excluded the tuple of my listed folders.

This gave me the exact result I was looking for.

My goal for this was to keep my mac organized.

I can Search any folder by path, locate & move specific file.types, ignore subfolders and i preemptively prompt the user if they want to move the files.

NOTE: the Prompt is only one time per run and is NOT per file

By Default the prompt defaults to NO when you hit enter instead of [y/N], and will just list the Potential files to be moved.

This is only a snippet of my GitHub Please visit for the total script.

HINT: Read the script below as I added info per line as to what I had done.

#!/usr/bin/env python3
# =============================================================================
# Created On  : MAC OSX High Sierra 10.13.6 (17G65)
# Created On  : Python 3.7.0
# Created By  : Jeromie Kirchoff
# =============================================================================
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# =============================================================================
from os import walk
from os import path
from shutil import move
import getpass
import click

mac_username = getpass.getuser()
includes_file_extensn = ([".jpg", ".gif", ".png", ".jpeg", ])
search_dir = path.dirname('/Users/' + mac_username + '/Documents/')
target_foldr = path.dirname('/Users/' + mac_username + '/Pictures/Archive/')
exclude_foldr = set([target_foldr,
                    path.dirname('/Users/' + mac_username +
                                 '/Documents/GitHub/'),
                     path.dirname('/Users/' + mac_username +
                                  '/Documents/Random/'),
                     path.dirname('/Users/' + mac_username +
                                  '/Documents/Stupid_Folder/'),
                     ])

if click.confirm("Would you like to move files?",
                 default=False):
    question_moving = True
else:
    question_moving = False


def organize_files():
    """THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
    # topdown=True required for filtering.
    # "Root" had all info i needed to filter folders not dir...
    for root, dir, files in walk(search_dir, topdown=True):
        for file in files:
            # creating a directory to str and excluding folders that start with
            if (not (str(root) + '/').startswith(tuple(exclude_foldr))):
                # showcase only the file types looking for
                if (file.endswith(tuple(includes_file_extensn))):
                    # using path.normpath as i found an issue with double //
                    # in file paths.
                    filetomove = path.normpath(str(root) + '/' +
                                               str(file))
                    # forward slash required for both to split
                    movingfileto = path.normpath(str(target_foldr) + '/' +
                                                 str(file))
                    # Answering "NO" this only prints the files "TO BE Moved"
                    print('Files To Move: ' + str(filetomove))
                    # This is using the prompt you answered at the beginning
                    if question_moving is True:
                        print('Moving File: ' + str(filetomove) +
                              "\n To:" + str(movingfileto))
                        # This is the command that moves the file
                        move(filetomove, movingfileto)
                        pass

            # The rest is ignoring explicitly and continuing
                    else:
                        pass
                    pass
                else:
                    pass
            else:
                pass


if __name__ == '__main__':
    organize_files()

Example of running my script from terminal:

$ python3 organize_files.py
Exclude list: {'/Users/jkirchoff/Pictures/Archive', '/Users/jkirchoff/Documents/Stupid_Folder', '/Users/jkirchoff/Documents/Random', '/Users/jkirchoff/Documents/GitHub'}
Files found will be moved to this folder:/Users/jkirchoff/Pictures/Archive
Would you like to move files?
No? This will just list the files.
Yes? This will Move your files to the target folder.
[y/N]: 

Example of listing files:

Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
...etc

Example of moving files:

Moving File: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
To: /Users/jkirchoff/Pictures/Archive/1.custom-award-768x512.jpg
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
To: /Users/jkirchoff/Pictures/Archive/10351458_318162838331056_9023492155204267542_n.jpg
...
JayRizzo
  • 3,234
  • 3
  • 33
  • 49