21

I want to build a program that uses some basic code to read through a folder and tell me how many files are in the folder. Here is how I do that currently:

import os

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)

This works great until there are multiple folders inside the "main" folder as it can return a long, junky list of files due to poor folder/file management. So I would like to go only to the second level at most. example:

Main Folder
---file_i_want
---file_i_want
---Sub_Folder
------file_i_want <--*
------file_i want <--*
------Sub_Folder_2
---------file_i_dont_want
---------file_i_dont_want

I know how to go to only the first level with a break and with del dirs[:] taken from this post and also this post.

import os
import pandas as pd

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)
        del dirs[:] # or a break here. does the same thing.

But no matter my searching I can't find out how to go two layers deep. I may just not be understanding the other posts on it or something? I was thinking something like del dirs[:2] but to no avail. Can someone guide me or explain to mehow to accomplish this?

Community
  • 1
  • 1
MattR
  • 4,887
  • 9
  • 40
  • 67
  • `dirs` is all the directories in the current directory, it's not the depth of the folder tree. – Peter Wood Mar 10 '17 at 14:28
  • @PeterWood, ah! that makes sense. So now i understand why `del dirs[:2]` is silly... – MattR Mar 10 '17 at 14:30
  • It looks like you're using Python 3. Is that correct? – PM 2Ring Mar 10 '17 at 14:41
  • @PM2Ring, that is correct. – MattR Mar 10 '17 at 14:42
  • Note that the codes in the accepted answer of the linked question are suitable for Python 2 as well, despite the question's title. For Python 3 use, they can be cleaned up a little by using `yield from` instead of those `yield` statements in the `for` loops. – PM 2Ring Mar 10 '17 at 14:57
  • @PM2Ring, although the *question may* be a duplicate. the answer is not. the other post was complex and too long compared to what posters have here on this question.. In my opinion, this post is much more concise and to the point. I believe it follows [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) more closely. – MattR Mar 10 '17 at 14:59
  • @MattR thanks for that comment :) Duplicates are sometimes all right, for instance if answers to duplicate questions bring something better or/and if question keywords are so different from the original answer that it improves the chances of future users searching & stumbling on one of those. – Jean-François Fabre Mar 10 '17 at 15:03
  • @MattR I agree, which is why I posted the link to that question instead of dupe-closing it straight away. But note that Kevin posted a short version as well as the long-winded one. – PM 2Ring Mar 10 '17 at 15:05

2 Answers2

27

you could do like this:

depth = 2

# [1] abspath() already acts as normpath() to remove trailing os.sep
#, and we need ensures trailing os.sep not exists to make slicing accurate. 
# [2] abspath() also make /../ and ////, "." get resolved even though os.walk can returns it literally.
# [3] expanduser() expands ~
# [4] expandvars() expands $HOME
# WARN: Don't use [3] expanduser and [4] expandvars if stuff contains arbitrary string out of your control.
#stuff = os.path.expanduser(os.path.expandvars(stuff)) # if trusted source
stuff = os.path.abspath(stuff)

for root,dirs,files in os.walk(stuff):
    if root[len(stuff):].count(os.sep) < depth:
        for f in files:
            print(os.path.join(root,f))

key is: if root[len(stuff):].count(os.sep) < depth

It removes stuff from root, so result is relative to stuff. Just count the number of files separators.

The depth acts like find command found in Linux, i.e. -maxdepth 0 means do nothing, -maxdepth 1 only scan files in first level, and -maxdepth 2 scan files included sub-directory.

Of course, it still scans the full file structure, but unless it's very deep that'll work.

Another solution would be to only use os.listdir recursively (with directory check) with a maximum recursion level, but that's a little trickier if you don't need it. Since it's not that hard, here's one implementation:

def scanrec(root):
    rval = []

    def do_scan(start_dir,output,depth=0):
        for f in os.listdir(start_dir):
            ff = os.path.join(start_dir,f)
            if os.path.isdir(ff):
                if depth<2:
                    do_scan(ff,output,depth+1)
            else:
                output.append(ff)

    do_scan(root,rval,0)
    return rval

print(scanrec(stuff))  # prints the list of files not below 2 deep

Note: os.listdir and os.path.isfile perform 2 stat calls so not optimal. In Python 3.5, the use of os.scandir could avoid that double call.

林果皞
  • 7,539
  • 3
  • 55
  • 70
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • This is what I wanted. and I really appreciate the "Key is" statement. if you could kindly explain or point me to some documentation on how you came to this answer? Im stuck on why the brackets in `root[len(stuff...]` and what `.count(os.sep)` does. Thank you very much. – MattR Mar 10 '17 at 14:40
  • Why not `break` when `root[len(stuff) + 1:].count(os.sep) >= 2`? When going top down, when that is true, it will be true for all other values of `root`. –  Mar 10 '17 at 14:41
  • @DavidCullen good point. Didn't think about that. actually it doesn't work, it seems to only apply for directories. – Jean-François Fabre Mar 10 '17 at 14:41
  • 1
    @MattR: I perform slicing on the string to remove the first chars (corresponding to the length of `stuff` in `root`, then I count the slashes/backslashes (os dependent: `os.sep`) on the relative path computed that way. – Jean-François Fabre Mar 10 '17 at 14:47
  • @Jean-FrançoisFabre ohhhhhh! i didn't notice the `:` at the end to perform the slicing. That's actually brilliant. thank you! – MattR Mar 10 '17 at 14:49
  • Tracking recursion depth is much cleaner than the clunky `os.sep` counting approach. `os.walk` is implemented in Python (using `.listdir in Python 2 and `.scandir` in Python 3.5+), so rolling your own version of `.walk` is just as efficient as using the standard one. – PM 2Ring Mar 10 '17 at 14:53
  • @PM2Ring, yes that wasn't too hard to write all things considered (and compared to the nightmarish C version we had to write ourselves). I didn't use `scandir` because I don't have python 3.5 here, but yes that would save some `fstat` calls. – Jean-François Fabre Mar 10 '17 at 14:54
  • 1
    @DavidCullen Thats not true, in the following setup: `x { a {1, 2}, b {1, 2} }`, the elements order would be: `x`, `x\a`, `x\a\1`, `x\a\2`, `x\b`, `x\b\1`, `x\b\2` so breaking when finding the first `>2` would mean `x\b` would not be iterated over. – Adirio Mar 10 '17 at 14:56
  • @Adirio: You are correct. I wrote a bunch of `os.walk` code a year ago. Apparently, I forgot how it works already. That will teach me to comment without running some code first. –  Mar 10 '17 at 15:16
8

You can count the separators and if it's two levels deep delete the content of dirs so walk doesn't recurse deeper:

import os

MAX_DEPTH = 2
folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)
        if root.count(os.sep) - stuff.count(os.sep) == MAX_DEPTH - 1:
            del dirs[:]

Python documentation states following about the behavior:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.

Note that you need to take into account the the separators present in the folders. For example when y:\path1 is walked root is y:\path but you don't want to stop recursion there.

niemmi
  • 17,113
  • 7
  • 35
  • 42
  • Thank you for pointing me toward the documentation! is it correct that `root.count(os.sep)` is counting the ``\\`` in the root? – MattR Mar 10 '17 at 14:46
  • 1
    @MattR: Yes, it's counting \ (or whatever is the os specific separator) in the root from where we need to subtract the separators present in the initial folder. – niemmi Mar 10 '17 at 14:54
  • As this is closed as duplicate you might want to post your answer there: https://stackoverflow.com/questions/35315873/travel-directory-tree-with-limited-recursion-depth – WinEunuuchs2Unix Jul 21 '20 at 01:52