0

How do I return all folders in a drive?

os.listdir(dir) would be a candidate but it only returns the immediate subdirectories, I want all subdirectories.

I saw another stack overflow question say to use [x[0] for x in os.walk(directory)] https://stackoverflow.com/a/973488/18323484 But I couldn't understand how to use it.

Oskar
  • 119
  • 8
  • 3
    it's pretty clear how to use it, just give it the directory whose tree you want to search and you'll get a list of all the directories.... – Matiiss Jul 28 '22 at 11:31
  • 1
    Does this answer your question? [Getting a list of all subdirectories in the current directory](https://stackoverflow.com/questions/973473/getting-a-list-of-all-subdirectories-in-the-current-directory) Currently you really haven't asked anything specific so the question you looked at also actually answers your question... – Matiiss Jul 28 '22 at 11:32
  • Look at https://stackoverflow.com/a/64086033/4865723 and use the modern and recommended `pathlib.Path` solutions. – buhtz Jul 28 '22 at 12:07
  • 1
    @buhtz: That solution (which does not recurse) is an egregious hack mixing `pathlib.Path` with `glob.glob` (AFAICT because it relies on a subtle difference in behavior between `glob.glob` and `pathlib.Path.glob` when you try to `glob` a pattern ending with `*/`). `pathlib` stuff is fine, but use it consistently, e.g. `[pth for pth in Path(directory).glob('**') if pth.is_dir()]` or `[pth for pth in Path(directory).glob('**/')]`. It's going to be slower than `os.walk` though (`Path` objects don't cache `stat` info, while `os.walk` uses `scandir`, getting basic type info for free w/o `stat`ing) – ShadowRanger Jul 28 '22 at 12:57
  • 1
    You can iterate through that list of subdirectories and do things with them. What don't you understand? – Eric Jin Jul 28 '22 at 14:02

1 Answers1

2

os.walk yields three-tuples for each directory traversed, in the form (currentdir, containeddirs, containedfiles). This listcomp:

[x[0] for x in os.walk(directory)]

just ignores the contents of each directory and just accumulates the directories it enumerates. It would be slightly nicer/more self-documenting if written with unpacking (using _ for stuff you don't care about), e.g.:

dirs = [curdir for curdir, _, _ in os.walk(directory)]

but they're both equivalent. To make it list for the entire drive, just provide the root of the drive as the directory argument to os.walk, e.g. for Windows:

c_drive_dirs = [curdir for curdir, _, _ in os.walk('C:\\')]

or for non-Windows:

alldirs = [curdir for curdir, _, _ in os.walk('/')]
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • you could also use `*_` instead of `_, _` there since I don't think it really matters to show how many items are in that tuple and this is shorter – Matiiss Jul 28 '22 at 12:09
  • @Matiiss: You could, but 1) I prefer validating that I get the number of items expected and explicitly indicating I know they all exist and am intentionally ignoring them, and 2) Using `*_` has (admittedly meaningless in the context of file system access) additional overhead (it has to create/destroy an extra two-item `list` each loop, where `_, _` creates no additional objects, and as a side-effect of reusing `_`, discards the `list` of directories immediately (the `list` of non-dirs immediately replaces it) rather than waiting for the next loop. Roughly doubles the listcomp overhead. – ShadowRanger Jul 28 '22 at 12:33
  • To be clear, when I say "Roughly doubles the listcomp overhead." I'm talking about the overhead of the listcomp itself (if timed with the cached results of an `os.walk`, not calling `os.walk` each time); as noted, said overhead is meaningless next to the cost and variability of file system access. But when I run `%timeit [d for d, _, _ in cached]` and compare to `%timeit [d for d, *_ in cached]` (where `cached = tuple(os.walk('.'))` in a folder with 3000 subdirectories), the cost for the former is ~145 µs, while the latter is ~350 µs (on my Python 3.10.5 x64 Linux install). – ShadowRanger Jul 28 '22 at 12:37
  • that's a huge overhead – Matiiss Jul 28 '22 at 12:47
  • @Matiiss: If you think about the work, it makes sense. You're avoiding a single array store operation (Python locals are assigned compile-time array indices, so storing to one is roughly equivalent to a C array store op, ignoring basic interpreter loop overhead, which they've reduced substantially for common cases like this) in exchange for building/discarding a variable length `list` (it's always length 2, but Python doesn't know that, so it dutifully checks `__length_hint__`, which does not have a C API slot so it's a CPU time pessimization for short inputs), involving two allocations/frees. – ShadowRanger Jul 28 '22 at 13:20
  • When the work per loop is so low (advance iterator via optimized API, C type/length check, three C array stores [locals], one array load [local], a couple refcnt manipulations, and the optimized amortized work `list` append operation), adding an unoptimized (no C API, so it needs to do a `dict` lookup and general function dispatch) function call plus two `malloc`s and two `free`s (`list` uses free store for the object header, so it might avoid one `malloc` and one `free` each loop, but even using the free store has some overhead) is a pretty significant – ShadowRanger Jul 28 '22 at 13:23