2

I want to print the files in subdirectory which is 2-level inside from root directory. In shell I can use the below find command

find -mindepth 3 -type f
./one/sub1/sub2/a.txt
./one/sub1/sub2/c.txt
./one/sub1/sub2/b.txt

In python How can i accomplish this. I know the basis syntax of os.walk, glob and fnmatch. But dont know how to specify the limit (like mindepeth and maxdepth in bash)

Lesmana
  • 25,663
  • 9
  • 82
  • 87
mathew
  • 31
  • 5

2 Answers2

5

You could use .count() method to find the depth:

import os

def files(rootdir='.', mindepth=0, maxdepth=float('inf')):
    root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1
    for dirpath, dirs, files in os.walk(rootdir):
        depth = dirpath.count(os.path.sep) - root_depth
        if mindepth <= depth <= maxdepth:
            for filename in files:
                yield os.path.join(dirpath, filename)
        elif depth > maxdepth:
            del dirs[:] # too deep, don't recurse

Example:

 print('\n'.join(files(mindepth=3)))

The answer to the related question uses the same technique.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
3

You cannot specify any of this to os.walk. However, you can write a function that does what you have in mind.

import os
def list_dir_custom(mindepth=0, maxdepth=float('inf'), starting_dir=None):
    """ Lists all files in `starting_dir` 
    starting from a `mindepth` and ranging to `maxdepth`

    If `starting_dir` is `None`, 
    the current working directory is taken.

    """
    def _list_dir_inner(current_dir, current_depth):
        if current_depth > maxdepth:
            return
        dir_list = [os.path.relpath(os.path.join(current_dir, x))
                    for x in os.listdir(current_dir)]
        for item in dir_list:
            if os.path.isdir(item):
                _list_dir_inner(item, current_depth + 1)
            elif current_depth >= mindepth:
                result_list.append(item)

    if starting_dir is None:
        starting_dir = os.getcwd()

    result_list = []
    _list_dir_inner(starting_dir, 1)
    return result_list

EDIT: Added the corrections, reducing unnecessary variable definitions.

2nd Edit: Included 2Rings suggestion to make it list the very same files as find, i.e. maxdepth is exclusive.

3rd EDIT: Added other remarks by 2Ring, also changed the path to relpath to return the output in the same format as find.

SmCaterpillar
  • 6,683
  • 7
  • 42
  • 70
  • Why on earth would you use a float for max depth - you cannot have a fraction of a level - I would use an integer value with the default of -1 and change the check. – Steve Barnes Feb 20 '15 at 08:37
  • @SteveBarnes: Fair point; I guess float('inf') kind of makes sense as there's not an equivalent integer infinity, although `sys.maxsize` would work ok. OTOH, I agree that using a sentinel is probably nicer, but rather than using -1 I'd use `None`, eg `if maxdepth is not None and current_depth > maxdepth:` – PM 2Ring Feb 20 '15 at 09:17
  • @2Ring, you can simply change that by using ``current_depth >= maxdepth`` and ``current_depth > min_depth`` to get the exact behavior of find. – SmCaterpillar Feb 20 '15 at 10:47
  • @SmCaterpillar: Almost there! I think you need `current_depth >= mindepth - 1`. Either that, or `current_depth >= maxdepth` and call it with `_list_dir_inner(starting_dir, 1)` – PM 2Ring Feb 20 '15 at 13:37
  • Ah, ok misunderstood the interpretation of mindepth, I thought, counting starts at 0, but instead 1 means "process all files except the command line arguments". – SmCaterpillar Feb 21 '15 at 09:30