2

I have used "os.walk()" to list all subfolders and files in a directory tree , but heard that "os.scandir()" does the job up to 2X - 20X faster. So I tried this code:

def tree2list (directory:str) -> list:
    import os
    tree = []
    counter = 0
    for i in os.scandir(directory):
        if i.is_dir():
            counter+=1
            tree.append ([counter,'Folder', i.name, i.path])  ## doesn't list the whole tree
            tree2list(i.path)
            #print(i.path)  ## this line prints all subfolders in the tree
        else:
            counter+=1
            tree.append([counter,'File', i.name, i.path])
            #print(i.path)  ## this line prints all files in the tree
    return tree

and when test it:

    ## tester
folder = 'E:/Test'
print(tree2list(folder))

I got only the content of the root directory and none from sub-directories below tree hierarchy, while all print statements in above code work fine.

[[1, 'Folder', 'Archive', 'E:/Test\\Archive'], [2, 'Folder', 'Source', 'E:/Test\\Source']]

What have I done wrong ?, and how can I fix it?!

Leo Sam
  • 83
  • 1
  • 12
  • Does this answer your question? [How do I use os.scandir() to return DirEntry objects recursively on a directory tree?](https://stackoverflow.com/questions/33135038/how-do-i-use-os-scandir-to-return-direntry-objects-recursively-on-a-directory) – Finomnis Jul 11 '22 at 11:56
  • You never propagate out the found paths from the recursive function calls. `tree` is local to the current call, it is not shared between recursive calls. And you never write the paths from the next `tree2list` recursion into it, you only write the topmost ones in. – Finomnis Jul 11 '22 at 11:57
  • I used the same idea of " get_tree_size" function, in : https://peps.python.org/pep-0471/#examples. It used the same way I used my recursion! @Finomnis – Leo Sam Jul 11 '22 at 12:42
  • No it didn't. `total += get_tree_size(entry.path)` - this is where the result of the lower part of the tree gets added to the total. This is exactly what you are missing. – Finomnis Jul 11 '22 at 13:51
  • 1
    os.walk: “Changed in version 3.5: This function now calls os.scandir() instead of os.listdir(), making it faster by reducing the number of calls to os.stat().” So just use os.walk. – Mark Tolonen Jul 11 '22 at 13:54
  • @MarkTolonen , great info. seems os.walk() worked much better for me. – Leo Sam Jul 11 '22 at 15:27

3 Answers3

3

Using generators (yield, yield from) allows to manage the recursion with concise code:

from pprint import pprint
from typing import Iterator, Tuple


def tree2list(directory: str) -> Iterator[Tuple[str, str, str]]:
    import os

    for i in os.scandir(directory):
        if i.is_dir():
            yield ["Folder", i.name, i.path]
            yield from tree2list(i.path)
        else:
            yield ["File", i.name, i.path]


folder = "/home/yfgy6415/dev/tmp"
pprint(list(tree2list(folder)))

Or: pprint(list(enumerate(tree2list(folder), start=1))) if you want the counter.

Gelineau
  • 2,031
  • 4
  • 20
  • 30
3

Your code almost works, just a minor modification is required:

def tree2list(directory: str) -> list:
    import os
    tree = []
    counter = 0
    for i in os.scandir(directory):
        if i.is_dir():
            counter += 1
            tree.append([counter, 'Folder', i.name, i.path])
            tree.extend(tree2list(i.path))
            # print(i.path)  ## this line prints all subfolders in the tree
        else:
            counter += 1
            tree.append([counter, 'File', i.name, i.path])
            # print(i.path)  ## this line prints all files in the tree
    return tree

Although I don't understand what the purpose of the counter variable is, so I'd probably remove it.

Further, I have to agree with @Gelineau that your approach utilizes array-copies quite heavily and is therefore most likely quite slow. An iterator based approach as in his response is more suited for a large number of files.

Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • great answer! I deleted "counter" as it didn't serve its purpose as counter. I intended to use it as item counter when transferring to a csv file. – Leo Sam Jul 11 '22 at 15:24
0

Adding to the accepted answer. In case... Getting all files in the directory and subdirectories matching some pattern (*.py for example):

import os
from fnmatch import fnmatch


def file_tree_fn(root):
    file_list = []
    for python_file in os.scandir(str(root)):
        if python_file.is_dir():
            file_list.extend(file_tree_fn(python_file.path))
        else:
            file_list.append(python_file.path) if fnmatch(python_file.path, "*.py") & python_file.is_file() else None
    return file_list

print(file_tree_fn(root))
ans2human
  • 2,300
  • 1
  • 14
  • 29