0

I have a python script that recursively searches folders to find matching file names. I was wondering if I could improve performance with threads or something like that.

import os

def search_folder(root, name, check_case = False):

    for root, dirs, files in os.walk(root):
        if check_case:
            if name in files:
                print(os.path.join(root, name))
        else:
            if name.lower() in map(lambda x: x.lower(), files):
                print(os.path.join(root, name))
        for directory in dirs:
            search_folder(os.path.join(root, directory), name, check_case)

def main():
    root_path, desired_file, check_case = "C:/", "hollow", "n"

    search_folder(root_path, desired_file, check_case)

if __name__ == "__main__":
    main()
Red
  • 26,798
  • 7
  • 36
  • 58
ertucode
  • 560
  • 2
  • 13
  • 1
    If you knew that there are multiple somewhat equally sized directories in the root, you could assign separate threads to them. But this requires outside knowledge that can't be derived by the script. Even then, the benefit would likely be marginal as your task is I/O bound. – Jan Wilamowski Jun 08 '22 at 01:31
  • Got it. I was wondering if there could be a way to have a set number of threads running and when one is free, I could give it a function to run. But I guess you are saying that wouldn't be that much faster either. – ertucode Jun 08 '22 at 01:43
  • 1
    Sure, that's what thread pools are for. But there's always a startup cost and your case, reading the file system, won't be improved by multiple CPU threads competing for the same BUS to access the disk. You could verify how much time is spent waiting for IO by profiling your function. – Jan Wilamowski Jun 08 '22 at 01:46
  • Your recursive call to `search_folder()` is pointless - `os.walk()` recurses into subdirectories already. Also note that `"n"` as the value for `check_case` is probably not doing what you expect - all non-empty strings are truthy. – jasonharper Jun 08 '22 at 03:08

0 Answers0