I am using os.walk to query a directory tree for directories with names that include any strings from a my_list
.
Directory tree:
./user/zebra/
./user/zebra/zebra_01/
./user/zebra/zebra_02/
./user/lion/
./user/lion/lion_01/
./user/lion/lion_01/giraffe_02
./user/giraffe/
./user/giraffe/giraffe_01
my_list = [‘zebra’, ‘giraffe’]
My script:
for dirpath, dirnames, filenames in os.walk(<path_to_directory_tree>, topdown=True):
for folders in dirnames:
for x in my_list:
if x in folders:
source_paths = os.path.join(dirpath, folders)
Output (i.e. print(source_paths)
):
./user/zebra/
./user/zebra/zebra_01/
./user/zebra/zebra_02/
./user/lion/lion_01/giraffe_02/
./user/giraffe/
./user/giraffe/giraffe_01
I can then further process this output to retain only the desired paths:
./user/zebra/
./user/lion/lion_01/giraffe_02/
./user/giraffe/
But with a massive directory tree, this method takes a very long time. Therefore, I want to avoid generating and then filtering the initial output by having os.walk stop searching recursively for “my_list
” directories once there is a parent path match, such that only the desired path output is generated.
I have seen dirnames[:] = []
used, but this would retain only ./user/giraffe/
(but not ./user/zebra/
)