Select files from specific directories

Question

I am trying to loop through a list of subdirectories, and perform two related operations:

Only select subdirectories that match a certain pattern, and save part of that name
Read a file in that subdirectory

I have tried adapting the answers in this question but am having trouble opening only certain subdirectories. I know I can do this recursively, where I loop through every file, and pull its parent directory using Path.parent, but this would also go into the directories I am not interested in.

My file structure looks like:

002normal
|- names.txt
|- test.txt
002custom
|- names.txt
|- test.txt

I would like only the directories ending in "normal". I'll then read the file named "names.txt" in that directory. I have tried something like the below, without luck.

import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
    for f in files:
        print(subdir)

blhsing · Accepted Answer · 2019-12-14T00:58:15.827

1

You can modify the dirs list in-place to filter out any subdirectories with names not ending with 'normal' so that os.walk won't traverse into them:

for subdir, dirs, files in os.walk(root_dir):
    dirs[:] = (name for name in dirs if name.endswith('normal'))
    if 'names.txt' in files:
        with open(os.path.join(subdir, 'names.txt')) as file:
            print(os.path.basename(subdir), file.read())

Excerpt from the documentation of os.walk:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.

edited Dec 14 '19 at 00:58

answered Dec 14 '19 at 00:31

blhsing

91,368
6
71
106

I like this, it's very elegant. The 2nd-to-last line needs to be `with open(os.path.join(root_dir, subdir, 'names.txt'), 'r') as f:` though – Adam_G Dec 14 '19 at 00:45
Also, how do extract the directory name, e.g. "002normal", assuming there are other directories above it? – Adam_G Dec 14 '19 at 00:53
1

Ah indeed. I've updated my answer to join `subdir` with `names.txt` (no need for `root_dir` by the way). The output also now includes the extraction of the base name of the sub-directory by calling `os.path.basename`. – blhsing Dec 14 '19 at 01:01

Boendal · Answer 2 · 2019-12-14T01:07:56.213

1

import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
    if str(subdir).endswith("normal"):
        for file in files:
            if str(file).startswith("names"):
                print(os.path.basename(subdir), file)
                f = open(os.path.join(root_dir,subdir,file), "r") 
                print(f.read())

That's how you can do it with your file structure. First you check if any subdir ends with "normal" and if it does you can check the content in the file. Also you have to build the path to the file so that you can read the file with os.path.join

In case you have multiple subdirectories of unknown depth you have to do something with while, but as long as the directory which contains names.txt ends with normal it works.

edited Dec 14 '19 at 01:07

answered Dec 14 '19 at 00:32

Boendal

2,496
1
23
36

Thanks. How do extract the directory name, e.g. "002normal", assuming there are other directories above it? – Adam_G Dec 14 '19 at 00:53
1

as told by blhsing already you can use os.path.basename(subdir). But I will add it as well. – Boendal Dec 14 '19 at 01:07

Select files from specific directories

2 Answers2