135

How do I limit os.walk to only return files in the directory I provide it?

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
    return outputList
Honest Abe
  • 8,430
  • 4
  • 49
  • 64
Setori
  • 10,326
  • 11
  • 40
  • 46
  • 3
    Another case where the multitude of possible approaches and all the caveats that go with them suggests that this functionality should be added to the Python standard library. – antred Oct 31 '16 at 19:26
  • `files_with_full_path = [f.path for f in os.scandir(dir) if f.is_file()]`. In case you need only the filenames use `f.name` instead of `f.path`. This is the fastest solution and much faster than any `walk` or `listdir`, see https://stackoverflow.com/a/40347279/2441026. – user136036 Jan 24 '20 at 13:08

21 Answers21

243

Don't use os.walk.

Example:

import os

root = "C:\\"
for item in os.listdir(root):
    if os.path.isfile(os.path.join(root, item)):
        print item
Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
  • 2
    @576i: this does not differentiate between files and directories –  Jun 03 '16 at 09:07
  • 4
    @Alexandr `os.path.isfile` and `os.path.isdir` lets you differentiate. I don't get it, since `os.path.isfile` is in the sample code since '08 and your comment is from '16. This is clearly the better answer, as you're not intending to walk a directory, but to list it. – Daniel F Aug 29 '17 at 08:17
  • @DanielF, what I meant here is that you need to loop over all items, while `walk` gives you immediately the separate lists of dirs and files. –  Aug 29 '17 at 13:09
  • Ah, ok. Actually Alex's answer seems to be better (using `.next()`) and it's much closer to your idea. – Daniel F Aug 29 '17 at 13:54
  • 1
    Python 3.5 has a `os.scandir` function which allows more sophisticated file-or-directory-object interaction. See [my answer](https://stackoverflow.com/a/56325893/3104974) below – ascripter May 27 '19 at 12:32
118

Use the walklevel function.

import os

def walklevel(some_dir, level=1):
    some_dir = some_dir.rstrip(os.path.sep)
    assert os.path.isdir(some_dir)
    num_sep = some_dir.count(os.path.sep)
    for root, dirs, files in os.walk(some_dir):
        yield root, dirs, files
        num_sep_this = root.count(os.path.sep)
        if num_sep + level <= num_sep_this:
            del dirs[:]

It works just like os.walk, but you can pass it a level parameter that indicates how deep the recursion will go.

nosklo
  • 217,122
  • 57
  • 293
  • 297
  • 3
    Does this function actually "walk" through the whole structure and then delete the entries below a certain point? Or is something more clever going on? I'm not even sure how to check this with code. --python beginner – mathtick Aug 19 '10 at 18:05
  • 1
    @mathtick: when some directory on or below the desired level is found, all of its subdirs are removed from the list of subdirs to search next. So they won't be "walked". – nosklo Aug 19 '10 at 19:41
  • 2
    I just +1'd this because I was struggling with how to "delete" dirs. I had tried `dirs = []` and `dirs = None` but those didn't work. `map(dirs.remove, dirs)` worked, but with some unwanted '[None]' messages printed. So, why `del dirs[:]` specifically? – Zach Young Oct 12 '12 at 00:53
  • great answer. +1'd just because it works with any code using `os.walk`. – idanshmu Dec 10 '14 at 13:09
  • Great function - really useful – Doron Shai Oct 21 '15 at 14:37
  • 4
    Note that this doesn't work when using `topdown=False` in os.walk. See the 4th paragraph in the [docs](https://docs.python.org/3.4/library/os.html?highlight=os.walk#os.walk): `Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.` – dthor Feb 24 '16 at 21:58
  • I love this one – codyc4321 Aug 21 '17 at 19:03
  • 3
    @ZacharyYoung `dirs = []` and `dirs = None` won't work because they just create a new unrelated object and assign to the name `dirs`. The original list object needs to be modified in-place, not the name `dirs`. – nosklo Oct 01 '18 at 16:34
  • How can I print the dirs of level 1 in this python script given a valid folder? Wait, I have to learn to use **yield**. – Timo Dec 24 '20 at 18:03
  • This is a great (really!) answer to another question. Therefore the amount of code is much higher than required. Other answers in this thread do the same thing for the specific request with much less code. – mgueydan Apr 30 '21 at 14:50
69

I think the solution is actually very simple.

use

break

to only do the first iteration of the for loop, there must be a more elegant way.

for root, dirs, files in os.walk(dir_name):
    for f in files:
        ...
        ...
    break
...

The first time you call os.walk, it returns tuples for the current directory, then on the next loop the contents of the next directory.

Take the original script and just add a break.

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        for f in files:
            if os.path.splitext(f)[1] in whitelist:
                outputList.append(os.path.join(root, f))
            else:
                self._email_to_("ignore")
        break
    return outputList
Pieter
  • 1,916
  • 17
  • 17
  • 11
    This should have been the accepted answer. Simply adding a "break" after the "for f in files" loop stops the recursiveness. You might also want to make sure that topdown=True. – Alecz Oct 31 '16 at 19:41
  • 2
    I just want to add this comment and say thank you for saving me time at work for giving such a good simplistic answer. – Steven Marsh Jul 15 '21 at 18:52
  • same here. It's simple and imho straight forward. I'm just wondering if this behavior is in the function specification. – Tomsim Jun 20 '23 at 23:35
28

The suggestion to use listdir is a good one. The direct answer to your question in Python 2 is root, dirs, files = os.walk(dir_name).next().

The equivalent Python 3 syntax is root, dirs, files = next(os.walk(dir_name))

CervEd
  • 3,306
  • 28
  • 25
Alex Coventry
  • 68,681
  • 4
  • 36
  • 40
  • 1
    Oh i was getting all sort of funny error from that one. ValueError: too many values to unpack – Setori Oct 24 '08 at 01:34
  • 1
    Nice! Feels like a hack, though. Like when you turn on an engine but only let it do one revolution and then pull the key to let it die. – Daniel F Aug 29 '17 at 08:24
  • Stumbled across this; `root, dirs, files = os.walk(dir_name).next()` gives me `AttributeError: 'generator' object has no attribute 'next'` – Evan Nov 27 '18 at 00:51
  • 3
    @Evan, probably because this is from 2008 and uses Python 2 syntax. In Python 3 you can write `root, dirs, files = next(os.walk(dir_name))` and then the variables `root, dirs, files` will only correspond to the variables of the generator at the `dir_name` level. – CervEd Mar 01 '19 at 12:12
15

You could use os.listdir() which returns a list of names (for both files and directories) in a given directory. If you need to distinguish between files and directories, call os.stat() on each name.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
10

If you have more complex requirements than just the top directory (eg ignore VCS dirs etc), you can also modify the list of directories to prevent os.walk recursing through them.

ie:

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        dirs[:] = [d for d in dirs if is_good(d)]
        for f in files:
            do_stuff()

Note - be careful to mutate the list, rather than just rebind it. Obviously os.walk doesn't know about the external rebinding.

martineau
  • 119,623
  • 25
  • 170
  • 301
Brian
  • 116,865
  • 28
  • 107
  • 112
8
for path, dirs, files in os.walk('.'):
    print path, dirs, files
    del dirs[:] # go only one level deep
masterxilo
  • 2,503
  • 1
  • 30
  • 35
5

Felt like throwing my 2 pence in.

baselevel = len(rootdir.split(os.path.sep))
for subdirs, dirs, files in os.walk(rootdir):
    curlevel = len(subdirs.split(os.path.sep))
    if curlevel <= baselevel + 1:
        [do stuff]
Matt R
  • 161
  • 2
  • 7
4

The same idea with listdir, but shorter:

[f for f in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, f))]
Dmitriy Simushev
  • 308
  • 1
  • 16
Oleg Gryb
  • 5,122
  • 1
  • 28
  • 40
4

Since Python 3.5 you can use os.scandir instead of os.listdir. Instead of strings you get an iterator of DirEntry objects in return. From the docs:

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because DirEntry objects expose this information if the operating system provides it when scanning a directory. All DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

You can access the name of the object via DirEntry.name which is then equivalent to the output of os.listdir

ascripter
  • 5,665
  • 12
  • 45
  • 68
  • 2
    Not only "can" you use, you **should** use `scandir()`, as it's a *lot* faster than `listdir()`. See benchmarks here: https://stackoverflow.com/a/40347279/2441026. – user136036 Jan 24 '20 at 13:02
2

You could also do the following:

for path, subdirs, files in os.walk(dir_name):
    for name in files:
        if path == ".": #this will filter the files in the current directory
             #code here
Diana G
  • 139
  • 11
2

In Python 3, I was able to do this:

import os
dir = "/path/to/files/"

#List all files immediately under this folder:
print ( next( os.walk(dir) )[2] )

#List all folders immediately under this folder:
print ( next( os.walk(dir) )[1] )
Jay Sheth
  • 1,738
  • 16
  • 15
2

root folder changes for every directory os.walk finds. I solver that checking if root == directory

def _dir_list(self, dir_name, whitelist):
    outputList = []
    for root, dirs, files in os.walk(dir_name):
        if root == dir_name: #This only meet parent folder
            for f in files:
                if os.path.splitext(f)[1] in whitelist:
                    outputList.append(os.path.join(root, f))
                else:
                    self._email_to_("ignore")
    return outputList
Pedro J. Sola
  • 91
  • 1
  • 4
1
import os

def listFiles(self, dir_name):
    names = []
    for root, directory, files in os.walk(dir_name):
        if root == dir_name:
            for name in files:
                names.append(name)
    return names
Rich
  • 11
  • 1
  • 1
    Hi Rich, welcome to Stack Overflow! Thank you for this code snippet, which might provide some limited short-term help. A proper explanation [would greatly improve](https://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers) its long-term value by showing why this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit](https://stackoverflow.com/posts/24187229/edit) your answer to add some explanation, including the assumptions you've made. – kenny_k Sep 30 '19 at 20:44
0

This is how I solved it

if recursive:
    items = os.walk(target_directory)
else:
    items = [next(os.walk(target_directory))]

...
Deifyed
  • 51
  • 1
  • 4
0

There is a catch when using listdir. The os.path.isdir(identifier) must be an absolute path. To pick subdirectories you do:

for dirname in os.listdir(rootdir):
  if os.path.isdir(os.path.join(rootdir, dirname)):
     print("I got a subdirectory: %s" % dirname)

The alternative is to change to the directory to do the testing without the os.path.join().

Kemin Zhou
  • 6,264
  • 2
  • 48
  • 56
0

You can use this snippet

for root, dirs, files in os.walk(directory):
    if level > 0:
        # do some stuff
    else:
        break
    level-=1
alexandre-rousseau
  • 2,321
  • 26
  • 33
0

create a list of excludes, use fnmatch to skip the directory structure and do the process

excludes= ['a\*\b', 'c\d\e']
for root, directories, files in os.walk('Start_Folder'):
    if not any(fnmatch.fnmatch(nf_root, pattern) for pattern in excludes):
        for root, directories, files in os.walk(nf_root):
            ....
            do the process
            ....

same as for 'includes':

if **any**(fnmatch.fnmatch(nf_root, pattern) for pattern in **includes**):
Hamsavardhini
  • 101
  • 2
  • 7
0

Why not simply use a range and os.walk combined with the zip? Is not the best solution, but would work too.

For example like this:

# your part before
for count, (root, dirs, files) in zip(range(0, 1), os.walk(dir_name)):
    # logic stuff
# your later part

Works for me on python 3.

Also: A break is simpler too btw. (Look at the answer from @Pieter)

PiMathCLanguage
  • 363
  • 4
  • 15
0

A slight change to Alex's answer, but using __next__():

print(next(os.walk('d:/'))[2]) or print(os.walk('d:/').__next__()[2])

with the [2] being the file in root, dirs, file mentioned in other answers

Oleg
  • 303
  • 2
  • 14
0

This is a nice python example

def walk_with_depth(root_path, depth):
        if depth < 0:
            for root, dirs, files in os.walk(root_path):
                yield [root, dirs[:], files]

            return

        elif depth == 0:
            return

        base_depth = root_path.rstrip(os.path.sep).count(os.path.sep)
        for root, dirs, files in os.walk(root_path):
            yield [root, dirs[:], files]

            cur_depth = root.count(os.path.sep)
            
            if base_depth + depth <= cur_depth:
                del dirs[:]
Alon Barad
  • 1,491
  • 1
  • 13
  • 26