210

I'm trying to write a simple Python script that will copy a index.tpl to index.html in all of the subdirectories (with a few exceptions).

I'm getting bogged down by trying to get the list of subdirectories.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Steven Noble
  • 10,204
  • 13
  • 45
  • 57
  • 11
    You may find that the accepted answer at this earlier SO question solves the problem: http://stackoverflow.com/questions/120656/directory-listing-in-python – Jarret Hardie Apr 28 '09 at 23:01

16 Answers16

250
import os
def get_immediate_subdirectories(a_dir):
    return [name for name in os.listdir(a_dir)
            if os.path.isdir(os.path.join(a_dir, name))]
PyNEwbie
  • 4,882
  • 4
  • 38
  • 86
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
154

I did some speed testing on various functions to return the full path to all current subdirectories.

tl;dr: Always use scandir:

list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]

Bonus: With scandir you can also simply only get folder names by using f.name instead of f.path.

This (as well as all other functions below) will not use natural sorting. This means results will be sorted like this: 1, 10, 2. To get natural sorting (1, 2, 10), please have a look at https://stackoverflow.com/a/48030307/2441026




Results: scandir is: 3x faster than walk, 32x faster than listdir (with filter), 35x faster than Pathlib and 36x faster than listdir and 37x (!) faster than glob.

Scandir:           0.977
Walk:              3.011
Listdir (filter): 31.288
Pathlib:          34.075
Listdir:          35.501
Glob:             36.277

Tested with W7x64, Python 3.8.1. Folder with 440 subfolders.
In case you wonder if listdir could be speed up by not doing os.path.join() twice, yes, but the difference is basically nonexistent.

Code:

import os
import pathlib
import timeit
import glob

path = r"<example_path>"



def a():
    list_subfolders_with_paths = [f.path for f in os.scandir(path) if f.is_dir()]
    # print(len(list_subfolders_with_paths))


def b():
    list_subfolders_with_paths = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))]
    # print(len(list_subfolders_with_paths))


def c():
    list_subfolders_with_paths = []
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            list_subfolders_with_paths.append( os.path.join(root, dir) )
        break
    # print(len(list_subfolders_with_paths))


def d():
    list_subfolders_with_paths = glob.glob(path + '/*/')
    # print(len(list_subfolders_with_paths))


def e():
    list_subfolders_with_paths = list(filter(os.path.isdir, [os.path.join(path, f) for f in os.listdir(path)]))
    # print(len(list(list_subfolders_with_paths)))


def f():
    p = pathlib.Path(path)
    list_subfolders_with_paths = [x for x in p.iterdir() if x.is_dir()]
    # print(len(list_subfolders_with_paths))



print(f"Scandir:          {timeit.timeit(a, number=1000):.3f}")
print(f"Listdir:          {timeit.timeit(b, number=1000):.3f}")
print(f"Walk:             {timeit.timeit(c, number=1000):.3f}")
print(f"Glob:             {timeit.timeit(d, number=1000):.3f}")
print(f"Listdir (filter): {timeit.timeit(e, number=1000):.3f}")
print(f"Pathlib:          {timeit.timeit(f, number=1000):.3f}")
user136036
  • 11,228
  • 6
  • 46
  • 46
79

Why has no one mentioned glob? glob lets you use Unix-style pathname expansion, and is my go to function for almost everything that needs to find more than one path name. It makes it very easy:

from glob import glob
paths = glob('*/')

Note that glob will return the directory with the final slash (as unix would) while most path based solutions will omit the final slash.

ari
  • 4,269
  • 5
  • 24
  • 33
  • 3
    Good solution, simple and works. For those who doesn't want that final slash, he can use this `paths = [ p.replace('/', '') for p in glob('*/') ]`. – Evan Hu Jun 28 '15 at 10:16
  • 5
    It might be safer to simply cut the last character with `[p[:-1] for p in paths]`, as that replace method will also replace any escaped forward slashes in the file name (not that those are common). – ari Nov 06 '15 at 01:34
  • 3
    Even safer, use strip('/') to remove trailing slashes. This way guarantees that you don't cut out any characters that are not forward slashes – Eliezer Miron Jul 07 '16 at 16:38
  • 8
    By construction you are guaranteed to have a trailing slash (so it's not safer), but I do think it's more readable. You definitely want to use `rstrip` instead of `strip`, though, since the latter will turn any fully qualified paths into relative paths. – ari Jul 07 '16 at 18:28
  • 7
    complement to @ari comment for python newbies such as I : `strip('/')` will remove both starting and trailing '/', `rstrip('/')` will remove only the trailing one – Titou Sep 16 '16 at 10:57
  • 1
    This also works with `pathlib`... you can do `subdirs = pathlib.Path('/foo/bar').glob('*/')` – s3cur3 Jun 19 '19 at 15:45
  • @s3cur3 You're right, nowadays I would usually use pathlib. I didn't back then because 1) I was stuck using python 2.7 and 2) pathlib hadn't come out yet – ari Jul 05 '23 at 13:17
  • 1
    @s3cur3 Actually, this won't work in python versions earlier than 3.11 due to a bug in pathlib: https://github.com/python/cpython/issues/66472 – ari Jul 05 '23 at 14:11
43

Check "Getting a list of all subdirectories in the current directory".

Here's a Python 3 version:

import os

dir_list = next(os.walk('.'))[1]

print(dir_list)
Community
  • 1
  • 1
Geng Jiawen
  • 8,904
  • 3
  • 48
  • 37
  • 2
    **Extremely clever.** While efficiency doesn't matter (_...it totally does_), I'm curious as to whether this or the glob-based generator expression `(s.rstrip("/") for s in glob(parent_dir+"*/"))` is more time efficient. My intuitive suspicion is that a `stat()`-based `os.walk()` solution *should* be profoundly faster than shell-style globbing. Sadly, I lack the will to `timeit` and actually find out. – Cecil Curry Jul 27 '17 at 08:01
  • 3
    Note that this returns the subdirectory names without the parent directory name prefixed to it. – Paul Chernoch Aug 11 '17 at 04:15
  • exactly what I was looking for....just the directory names without the path- @Cecil I tried your approach but could not get it to work. Tried `tmplist=(s.rstrip("/") for s in glob(tmp+"*/"))`, where tmp is my parent directory. It returns ` at 0x0000018A88EE3660>`. What am I doing wrong? – jrive Apr 01 '21 at 20:09
20
import os

To get (full-path) immediate sub-directories in a directory:

def SubDirPath (d):
    return filter(os.path.isdir, [os.path.join(d,f) for f in os.listdir(d)])

To get the latest (newest) sub-directory:

def LatestDirectory (d):
    return max(SubDirPath(d), key=os.path.getmtime)
Milan
  • 351
  • 2
  • 4
13

os.walk is your friend in this situation.

Straight from the documentation:

walk() generates the file names in a directory tree, by walking the tree either top down or bottom up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Andrew Cox
  • 10,672
  • 3
  • 33
  • 38
  • 1
    Just be aware that if you only want the first-level subdirectories then break out of the os.walk iteration after the first set of return values. – yoyo Nov 27 '15 at 03:54
12

This method nicely does it all in one go.

from glob import glob
subd = [s.rstrip("/") for s in glob(parent_dir+"*/")]
SuaveSouris
  • 1,302
  • 17
  • 20
7

Using Twisted's FilePath module:

from twisted.python.filepath import FilePath

def subdirs(pathObj):
    for subpath in pathObj.walk():
        if subpath.isdir():
            yield subpath

if __name__ == '__main__':
    for subdir in subdirs(FilePath(".")):
        print "Subdirectory:", subdir

Since some commenters have asked what the advantages of using Twisted's libraries for this is, I'll go a bit beyond the original question here.


There's some improved documentation in a branch that explains the advantages of FilePath; you might want to read that.

More specifically in this example: unlike the standard library version, this function can be implemented with no imports. The "subdirs" function is totally generic, in that it operates on nothing but its argument. In order to copy and move the files using the standard library, you need to depend on the "open" builtin, "listdir", perhaps "isdir" or "os.walk" or "shutil.copy". Maybe "os.path.join" too. Not to mention the fact that you need a string passed an argument to identify the actual file. Let's take a look at the full implementation which will copy each directory's "index.tpl" to "index.html":

def copyTemplates(topdir):
    for subdir in subdirs(topdir):
        tpl = subdir.child("index.tpl")
        if tpl.exists():
            tpl.copyTo(subdir.child("index.html"))

The "subdirs" function above can work on any FilePath-like object. Which means, among other things, ZipPath objects. Unfortunately ZipPath is read-only right now, but it could be extended to support writing.

You can also pass your own objects for testing purposes. In order to test the os.path-using APIs suggested here, you have to monkey with imported names and implicit dependencies and generally perform black magic to get your tests to work. With FilePath, you do something like this:

class MyFakePath:
    def child(self, name):
        "Return an appropriate child object"

    def walk(self):
        "Return an iterable of MyFakePath objects"

    def exists(self):
        "Return true or false, as appropriate to the test"

    def isdir(self):
        "Return true or false, as appropriate to the test"
...
subdirs(MyFakePath(...))
Glyph
  • 31,152
  • 11
  • 87
  • 129
  • Since I have little exposure to Twisted, I always welcome additional info and examples; this answer is nice to see for that. Having said that, since this approach appears to require substantially more work than using the built-in python modules, and a Twisted install, are there any advantages to using this that you could add to the answer? – Jarret Hardie Apr 28 '09 at 23:22
  • 1
    Glyph's answer was probably inspired by the fact that TwistedLore also uses .tpl files. – Constantin Apr 29 '09 at 00:15
  • Well, clearly I don't expect the Spanish inquisition :-) I assumed "*.tpl" was a generic reference to some abstract extension meaning "template", and not a specific Twisted template (I've seen .tpl used in many languages after all). Good to know. – Jarret Hardie Apr 29 '09 at 01:33
  • +1 therefore for twigging to the possible Twisted angle, though I'd still like to understand what Twisted'd 'FilePath' object and 'walk()' function add to the standard API. – Jarret Hardie Apr 29 '09 at 01:36
  • Personally I find "FilePath.walk() yields path objects" a lot easier to remember than "os.walk yields 3-tuples of dir, dirs, files". But there are other benefits. FilePath allows for polymorphism, which means you can traverse things other than filesystems. For example, you could pass a twisted.python.zippath.ZipArchive to my 'subdirs' function and get a generator of ZipPaths out instead of FilePaths; your logic doesn't change, but your application now magically handles zip files. If you want to test it, you just have to supply an object, you don't have to write real files. – Glyph Apr 30 '09 at 18:24
4

I just wrote some code to move vmware virtual machines around, and ended up using os.path and shutil to accomplish file copying between sub-directories.

def copy_client_files (file_src, file_dst):
    for file in os.listdir(file_src):
            print "Copying file: %s" % file
            shutil.copy(os.path.join(file_src, file), os.path.join(file_dst, file))

It's not terribly elegant, but it does work.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
2
def get_folders_in_directories_recursively(directory, index=0):
    folder_list = list()
    parent_directory = directory

    for path, subdirs, _ in os.walk(directory):
        if not index:
            for sdirs in subdirs:
                folder_path = "{}/{}".format(path, sdirs)
                folder_list.append(folder_path)
        elif path[len(parent_directory):].count('/') + 1 == index:
            for sdirs in subdirs:
                folder_path = "{}/{}".format(path, sdirs)
                folder_list.append(folder_path)

    return folder_list

The following function can be called as:

get_folders_in_directories_recursively(directory, index=1) -> gives the list of folders in first level

get_folders_in_directories_recursively(directory) -> gives all the sub folders

Kanish Mathew
  • 825
  • 10
  • 6
1

I have to mention the path.py library, which I use very often.

Fetching the immediate subdirectories become as simple as that:

my_dir.dirs()

The full working example is:

from path import Path

my_directory = Path("path/to/my/directory")

subdirs = my_directory.dirs()

NB: my_directory still can be manipulated as a string, since Path is a subclass of string, but providing a bunch of useful methods for manipulating paths

olinox14
  • 6,177
  • 2
  • 22
  • 39
1

Here's one way:

import os
import shutil

def copy_over(path, from_name, to_name):
  for path, dirname, fnames in os.walk(path):
    for fname in fnames:
      if fname == from_name:
        shutil.copy(os.path.join(path, from_name), os.path.join(path, to_name))


copy_over('.', 'index.tpl', 'index.html')
Scott Kirkwood
  • 1,135
  • 12
  • 15
  • -1: won't work, since shutil.copy will copy to the current dir, so you'll end up overwriting 'index.html' in the current dir once for each 'index.tpl' you find in the subdirectory tree. – nosklo Apr 29 '09 at 13:35
0
import glob
import os

def child_dirs(path):
     cd = os.getcwd()        # save the current working directory
     os.chdir(path)          # change directory 
     dirs = glob.glob("*/")  # get all the subdirectories
     os.chdir(cd)            # change directory to the script original location
     return dirs

The child_dirs function takes a path a directory and returns a list of the immediate subdirectories in it.

dir
 |
  -- dir_1
  -- dir_2

child_dirs('dir') -> ['dir_1', 'dir_2']
Amjad
  • 3,110
  • 2
  • 20
  • 19
0
import pathlib


def list_dir(dir):
    path = pathlib.Path(dir)
    dir = []
    try:
        for item in path.iterdir():
            if item.is_dir():
                dir.append(item)
        return dir
    except FileNotFoundError:
        print('Invalid directory')
Yossarian42
  • 1,950
  • 17
  • 14
0

One liner using pathlib:

list_subfolders_with_paths = [p for p in pathlib.Path(path).iterdir() if p.is_dir()]
bsimpson53
  • 485
  • 5
  • 9
0

you can try this:

import os  
rec = os.walk("D:\\")  
got = False  
for r in rec:  
    for s in r:  
        root=s  
        if got:  
            break  
        got=True  
    if got:  
        break  
  
#display the list  
for r in root:          
    print(r)  
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
lish
  • 1
  • 1