1

I need to recursively find all paths below a folder with names that contain the substring "Bar". In Python 2.7

That is for the folder structure

Foo
|
------ Doug
|        |
|        --------CandyBar
|
---------MilkBar

I need to get the list ["Foo/Doug/CandyBar", "Foo/MilkBar"]

Now I can use os.walk and glob.glob and write a bunch of loops to get this list but I'm wondering if I'm missing a simpler technique.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
empty
  • 5,194
  • 3
  • 32
  • 58
  • 1
    You don't need to write a bunch of loops. Check the answer here: https://stackoverflow.com/a/2186565/3101082 for using `glob.glob` recursively. – dub stylee Jan 23 '18 at 00:32
  • Try writing some code and showing us, you might come up with something "Pythonic" all by yourself! Some hints: try using recursion and list comprehensions. – Turksarama Jan 23 '18 at 00:35
  • I've removed the request for something “pythonic” just because that can be a bit subjective, otherwise the question is good. – Dietrich Epp Jan 23 '18 at 01:04
  • Glob can also find folders only, see: https://stackoverflow.com/a/36426997/2305545 – NOhs Jan 23 '18 at 01:18

2 Answers2

3

Maybe it is a good choice to use a generator

import os
res = (path for path,_,_ in os.walk("path") if "bar" in path)

NOTE: I use "/" as root path because my system is unix-like. If you are on windows substitute "/" with "C:\" (or whatever you want)

PROS:

  • generators use far less memory and does not "blocks" the system while computing.

example:

# returns immediately
res = (path for path,_,_ in os.walk("/") if "bar" in path)

#I have to wait (who knows how much time)
res = [path for path,_,_ in os.walk("/") if "bar" in path]
  • You can get one path at time with waiting only the time needed to find the next "path"

example:

res = (path for path,_,_ in os.walk("/") if "bar" in path)
# the for starts at no time
for path in res:
    # at each loop I only wait the time needed to compute the next path
    print(path) # see the path printed as it is computed 

res = [path for path,_,_ in os.walk("/") if "bar" in path]
# the for starts only after all paths are computed
for path in res:
    # no wait for each loop.
    print(path) # all paths printed at once 
  • if you want to keep the "path" found a part you CAN STORE it in a list and have only the "path" you are interested in (less memory usage)

example:

res = (path for path,_,_ in os.walk("/") if "bar" in path)
path_store = []
for path in res:
    # I'm only interested in paths having odd length
    # at the end of the loop I will use only needed memory
    if(len(path)%2==1):
        path_store.append(path)
  • if at some point you are done and you are not interested in looking for more "paths" you can stop at any moment saving the time needed for all paths not computed

example:

res = (path for path,_,_ in os.walk("/") if "bar" in path)
path_store = []
count = 10
for path in res:
    # I'm only interested in paths having odd length
    if(len(path)%2==1):
        count -= 1
        path_store.append(path)
        # I'm only interested in the first 10 paths.
        # Using generator I waited only for the computation of those 10 paths.
        # Using list you will always wait for the computation for all paths
        if( count <= 0 ):
            break

CONS:

  • You can't use indexes with generators. You can only get the next item.

  • if you want a list with all paths at once, you have to convert it in a list (so it is better to use a list comprehension)

  • generators are one-shot forward (you can't go back after getting the next element)

  • if you want to keep some "path" you HAVE TO store it somewhere (like a list), otherwise it will be lost

in the code path is lost at each iteration. At the end of the loop res is exhausted and is no more usable. I have to store the path I'm interested in in the list path_store.

path_store = []
for path in res:
    # I'm only interested in paths having odd length
    if(len(path)%2==1):
        path_store.append(path)
path = next(res) # Error StopIteration
alp
  • 389
  • 2
  • 5
2

Try this:

import os
[x for x, _, _ in os.walk("path") if "bar" in x and os.path.isdir(x)]
whackamadoodle3000
  • 6,684
  • 4
  • 27
  • 44
  • You can make this is little cleaner if you unpack as `r, _, _` discarding the second and third argument. Then, you'd just need `if "bar" in r`. – cs95 Jan 23 '18 at 00:37
  • I'm sorry, but what does the `_` do? – whackamadoodle3000 Jan 23 '18 at 00:42
  • It's to signify you only want a subset of the arguments and are throwing away the rest. – cs95 Jan 23 '18 at 00:43
  • Oh, I get it. So it allows you to only unpack some of the arguments. – whackamadoodle3000 Jan 23 '18 at 00:44
  • You unpack them all... but you discard some of them (in this case, everything except the first argument. You can also do `r, *_` for python versions that support it. – cs95 Jan 23 '18 at 00:44
  • 2
    OP's question title suggests that they want to find folders only. If that is really the case, use `os.path.isdir()` to satisfy that: `[x[0] for x in os.walk("path") if "bar" in x[0] and os.path.isdir(x[0])]` – mhawke Jan 23 '18 at 00:47