30

Python 3.5's os.scandir(path) function returns lightweight DirEntry objects that are very helpful with information about files. However, it only works for the immediate path handed to it. Is there a way to wrap it in a recursive function so that it visits all subdirectories beneath the given path?

Ben Hoyt
  • 10,694
  • 5
  • 60
  • 84
Mark
  • 313
  • 1
  • 3
  • 5
  • 1
    Take a look at [os.walk()](https://docs.python.org/3.5/library/os.html#os.walk). It may be a bit more heavy-handed than your looking for, but it should be more simple than creating your own solution. – skrrgwasme Oct 14 '15 at 20:35

1 Answers1

57

You can scan recursively using os.walk(), or if you need DirEntry objects or more control, write a recursive function like scantree() below:

try:
    from os import scandir
except ImportError:
    from scandir import scandir  # use scandir PyPI module on Python < 3.5

def scantree(path):
    """Recursively yield DirEntry objects for given directory."""
    for entry in scandir(path):
        if entry.is_dir(follow_symlinks=False):
            yield from scantree(entry.path)  # see below for Python 2.x
        else:
            yield entry

if __name__ == '__main__':
    import sys
    for entry in scantree(sys.argv[1] if len(sys.argv) > 1 else '.'):
        print(entry.path)

Notes:

  • There are a few more examples in PEP 471 and in the os.scandir() docs.
  • You can also add various logic in the for loop to skip directories or files starting with '.' and that kind of thing.
  • You typically want follow_symlinks=false on the is_dir() calls in recursive functions like this, to avoid symlink loops.
  • On Python 2.x, replace the yield from line with:

    for entry in scantree(entry.path):
        yield entry
    
Ben Hoyt
  • 10,694
  • 5
  • 60
  • 84
  • 1
    Given that `os.scandir` only exists in Python 3.5, the Python 2 fallback code probably isn't needed. :-) __Edit:__ Ah, you wrote it to import the PyPI module if `os.scandir` didn't exist, and I'm guessing the PyPI module is available for 2.7? – ShadowRanger Oct 14 '15 at 20:46
  • 2
    @ShadowRanger Well, true, but this way it'll work for Python < 3.5 (including Python 2.x) using my [scandir](https://pypi.python.org/pypi/scandir) module. :-) – Ben Hoyt Oct 14 '15 at 20:48
  • @ShadowRanger I've added a code comment to clarify. – Ben Hoyt Oct 14 '15 at 20:51
  • you're the author of the backport! congrats it's a life saver – Jean-François Fabre Feb 22 '19 at 14:46
  • 4
    os.walk vs os.scandir -- I ran these 2 functions against a directory with over 4 million directories and files. os.walk took 34 min, 29 sec and os.scandir took 7 minutes, 46 sec. So, at least for my test, it appears os.scandir is about 4 1/2 times faster. – tzg Dec 17 '19 at 19:25
  • The [**`scandir`**](https://docs.python.org/3/library/os.html#os.scandir) documentation also recommends using a context manager, i.e. `with scandir(path) as entries:` – Peter Wood Apr 15 '20 at 17:08
  • 3
    It's worth noting that as a part of PEP 471 os.walk was updated to use os.scandir under the hood. – Slater Victoroff Jan 06 '21 at 18:09