I have a directory tree with csv files, and I want to return files following this pattern (the pattern is from somewhere else, so I will need to stick to that):
"foo"
should match foo/**/*.csv
and/or foo.csv
, so that
"foo/bar"
matches e.g. foo/bar.csv
, foo/bar/baz.csv
and foo/bar/baz/qux.csv
So far, I have been iterating through the directory tree twice; first looking for files and then for directories:
from glob import iglob
from itertools import chain
import os
path = "csv_dir"
pattern = "foo/bar"
pattern = os.path.join(*pattern.split("/"))
path_with_pattern = os.path.join(path, pattern)
# first get all csv files in foo/bar and subdirs
files_1 = chain.from_iterable(iglob(os.path.join(root, '*.csv'))
for root, dirs, files in os.walk(path_with_pattern))
# then get all foo/bar.csv files
files_2 = chain.from_iterable(iglob(os.path.join(root, pattern + '.csv'))
for root, dirs, files in os.walk(path))
for f in chain(files_1, files_2):
print(f)
This works, but it feels stupid to iterate the tree twice. Is there a clever file matching method I have missed? Or a simple way to filter them out if I start by getting all csv files in the tree?