5

I know how to list recursively all files/folders of d:\temp with various methods, see How to use glob() to find files recursively?.
But often I'd like to avoid to have the d:\temp\ prefix in the results, and have relative paths to this base instead.

This can be done with:

  • import os, glob
    for f in glob.glob('d:\\temp\\**\\*', recursive=True):
        print(os.path.relpath(f, 'd:\\temp'))
    
  • idem with f.lstrip('d:\\temp\\') which removes this prefix

  • import pathlib
    root = pathlib.Path("d:\\temp")
    print([p.relative_to(root) for p in root.glob("**/*")])
    

These 3 solutions work. But in fact if you read the source of glob.py, it does accumulate/join all the parts of the path. So the solution above is ... "removing something that was just added before"! It works, but it's not very elegant. Idem for pathlib with relative_to which removes the prefix.

Question: how to modify the next few lines to not have d:\temp in the output (without removing something that was concatenated before!)?

import os

def listpath(path):
    for f in os.scandir(path):
        f2 = os.path.join(path, f)
        if os.path.isdir(f):
            yield f2
            yield from listpath(f2)
        else:
            yield f2

for f in listpath('d:\\temp'):
    print(f)

#d:\temp\New folder
#d:\temp\New folder\New Text Document - Copy.txt
#d:\temp\New folder\New Text Document.txt
#d:\temp\New Text Document - Copy.txt
#d:\temp\New Text Document.txt
Basj
  • 41,386
  • 99
  • 383
  • 673
  • Style tip: use rawstrings on paths to avoid all that escaping: `r'd:\temp\**\*'`. (Btw, Windows has supported forward-slash in paths since back in 1995, you can also do `r'd:/temp/**/*'`) – smci Dec 05 '20 at 03:42

1 Answers1

4

You can do something like shown in the following example. Basically, we recursively return the path parts joining them together, but we don't join the initial root.

import os

def listpath(root, parent=''):
    scan = os.path.join(root, parent)
    for f in os.scandir(scan):
        f2 = os.path.join(parent, f.name)
        yield f2
        if f.is_dir():
            yield from listpath(root, f2)

for f in listpath('d:\\temp'):
    print(f)

In Python 3.10, which is not released yet, there will be a new root_dir option which will allow you to do this with the built-in glob with no problem:

import glob
glob.glob('**/*', root_dir='d:\\temp', recursive=True)

You could also use a 3rd party library such as the wcmatch library that has already implemented this behavior (which I am the author of). But in this simple case, your listpath approach may be sufficient.

Basj
  • 41,386
  • 99
  • 383
  • 673
facelessuser
  • 1,656
  • 1
  • 13
  • 11
  • Thanks for this great solution @facelessuser! I just edited and replaced `parent=None`, by `parent=""`, then `os.path.join` always works, no need to test if parent is None. Is it ok for you? – Basj Dec 04 '20 at 07:59
  • As a sidenote, your solution is 2x to 5x faster than the 3 unsatisfying solutions that I mentioned in the original question, great! :) (Tested with 20k files in 1k subdirs) – Basj Dec 04 '20 at 08:20
  • Yes, there is a bit more overhead compiling glob patterns and then doing actual comparisons, if all you want to do is list folders, taking that all out is much faster. Glob has its place, but if you do not need the power of matching file patterns, a simple file listing is more than sufficient. – facelessuser Dec 04 '20 at 14:28
  • As far as changing `parent=None` is concerned, off the top of my head, I don't see a problem, but obviously, I haven't tested it to a great extent. I'm usually pretty explicit about my conditions to avoid doing unnecessary work, but I don't see an immediate problem with always joining. – facelessuser Dec 04 '20 at 14:32