161

I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?

fish2000
  • 4,289
  • 2
  • 37
  • 76
UserYmY
  • 8,034
  • 17
  • 57
  • 71
  • 1
    related to: [Use a Glob() to find files recursively in Python](http://stackoverflow.com/q/2186525/1463143) – samkhan13 Jun 10 '13 at 18:56
  • 1
    Does this answer your question? [How to use glob() to find files recursively?](https://stackoverflow.com/questions/2186525/how-to-use-glob-to-find-files-recursively) – Basj Dec 03 '20 at 21:34

13 Answers13

241

In Python 3.5 and newer use the new recursive **/ functionality:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

When recursive is set, ** followed by a path separator matches 0 or more subdirectories.

In earlier Python versions, glob.glob() cannot list files in subdirectories recursively.

In that case I'd use os.walk() combined with fnmatch.filter() instead:

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

This'll walk your directories recursively and return all absolute pathnames to matching .txt files. In this specific case the fnmatch.filter() may be overkill, you could also use a .endswith() test:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 4
    I can see: **glob.glob('/path to directory/*/*.txt")** working for me. This is bascially using the Unix shell rule. – Surya May 15 '16 at 21:09
  • 10
    @User123: that doesn't list directories *recursively*. You are listing all text files *one level deep*, but not in further subdirectories or even directly in `path to directory`. – Martijn Pieters May 15 '16 at 21:21
  • 2
    This is not completely related, but why does setting `recursive=False` together with the `**/` functionality does not provide the list of files just in the given folder, but rather in its children? – Dr_Zaszuś Jun 18 '18 at 15:08
  • @Dr_Zaszuś: sorry? `**/` gives a list of *directory names* in the current working directory, because the pattern ends in `/`, and with `recursive=False` you basically have a double `*`, matching just the same as `*/`, just less efficient. – Martijn Pieters Jun 18 '18 at 15:30
  • 1
    @Dr_Zaszuś: use `*/*` if you need all files in all subdirectories. – Martijn Pieters Jun 18 '18 at 15:31
  • @MartijnPieters Yes, thank you. The unexpected behavior is however that with `recursive=True` you get all files in all subfolders recursively + in the given folder, but with `False` you counter-intuitively (at least in my opinion) get only files in the first-level children and NOT in the given folder. The more intuitive is the shell way, when if you don't give the recursive flag to `find` you will only get results in the current folder. – Dr_Zaszuś Jun 19 '18 at 17:27
  • @Dr_Zaszuś: Well, `**` only has special recursive-directory-matching behaviour when `recursive=True` is set. Python's glob module did not support *any* recursive matching for a *long* time and there's a lot of code out there that may have accidentally used `**` in patterns, and the Python core developers did not want to break such code. So `recursive=True` is needed to explicitly switch on the special pattern. Without it, `**` is just two separate `*` patterns, each matching zero or more characters. You should read the pattern as if there is just one `*` in it, and that's how it'll behave. – Martijn Pieters Jun 19 '18 at 17:52
  • `recursive` is confusing, I thought it could traverse the current directory and all its subdirectories. – CodingNinja Sep 21 '22 at 11:27
  • @CodingNinja: you mean the `recursive=False` keyword option? It *can* traverse the current directory and all its subdirectories **provided you give it a recursive glob**. The flag enables or disables recognition of the `**` syntax, and for forward compatibility reasons *and* security reasons it is off by default. The documentation is very clear on what it does however. – Martijn Pieters Oct 07 '22 at 11:19
81

There's a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):

  1. glob.glob('*.txt') :matches all files ending in '.txt' in current directory
  2. glob.glob('*/*.txt') :same as 1
  3. glob.glob('**/*.txt') :matches all files ending in '.txt' in the immediate subdirectories only, but not in the current directory
  4. glob.glob('*.txt',recursive=True) :same as 1
  5. glob.glob('*/*.txt',recursive=True) :same as 3
  6. glob.glob('**/*.txt',recursive=True):matches all files ending in '.txt' in the current directory and in all subdirectories

So it's best to always specify recursive=True.

germ
  • 1,477
  • 1
  • 18
  • 18
  • 1
    I am not sure if case 3. is correct. I tried with pathlib.Path.glob and the pattern returns the same result as case 6. (all txt files recursively). This is also what the current favourite answer mentions (https://stackoverflow.com/a/14798263/7919597) – Joe Oct 05 '22 at 07:41
26

To find files in immediate subdirectories:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

For a recursive version that traverse all subdirectories, you could use ** and pass recursive=True since Python 3.5:

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

Both function calls return lists. You could use glob.iglob() to return paths one by one. Or use pathlib:

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

Both methods return iterators (you can get paths one by one).

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Yes, I understood that; but I didn't expect `glob()` to support patterns in directories either. – Martijn Pieters Feb 10 '13 at 13:57
  • Comment deleted, I see now that it gave the wrong impression; besides, the patch includes a documentation update for the `**` recursion case. But for `**` to work, you *have* to set the `recursion=True` switch, btw. – Martijn Pieters Feb 10 '13 at 14:53
17

The glob2 package supports wild cards and is reasonably fast

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

On my laptop it takes approximately 2 seconds to match >60,000 file paths.

megawac
  • 10,953
  • 5
  • 40
  • 61
9

You can use Formic with Python 2.6

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

Disclosure - I am the author of this package.

Andrew Alcock
  • 19,401
  • 4
  • 42
  • 60
4

Here is a adapted version that enables glob.glob like functionality without using glob2.

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

So if you have the following dir structure

tests/files
├── a0
│   ├── a0.txt
│   ├── a0.yaml
│   └── b0
│       ├── b0.yaml
│       └── b00.yaml
└── a1

You can do something like this

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

Pretty much fnmatch pattern match on the whole filename itself, rather than the filename only.

cevaris
  • 5,671
  • 2
  • 49
  • 34
4

(The first options are of course mentioned in other answers, here the goal is to show that glob uses os.scandir internally, and provide a direct answer with this).


Using glob

As explained before, with Python 3.5+, it's easy:

import glob
for f in glob.glob('d:/temp/**/*', recursive=True):
    print(f)

#d:\temp\New folder
#d:\temp\New Text Document - Copy.txt
#d:\temp\New folder\New Text Document - Copy.txt
#d:\temp\New folder\New Text Document.txt

Using pathlib

from pathlib import Path
for f in Path('d:/temp').glob('**/*'):
    print(f)

Using os.scandir

os.scandir is what glob does internally. So here is how to do it directly, with a use of yield:

def listpath(path):
    for f in os.scandir(path):
        f2 = os.path.join(path, f)
        if os.path.isdir(f):
            yield f2
            yield from listpath(f2)
        else:
            yield f2

for f in listpath('d:\\temp'):
    print(f)
Basj
  • 41,386
  • 99
  • 383
  • 673
3

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

Doesn't works for all cases, instead use glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")
Vojtech Ruzicka
  • 16,384
  • 15
  • 63
  • 66
NILESH KUMAR
  • 413
  • 5
  • 10
2

If you can install glob2 package...

import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")

All filenames and folders:

all_ff = glob2.glob("C:\\top_directory\\**\\**")  
dreab
  • 705
  • 3
  • 12
  • 22
2

If you're running Python 3.4+, you can use the pathlib module. The Path.glob() method supports the ** pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path objects for all matching files.

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
1

You can use the function glob.glob() or glob.iglob() directly from glob module to retrieve paths recursively from inside the directories/files and subdirectories/subfiles.

Syntax:

glob.glob(pathname, *, recursive=False) # pathname = '/path/to/the/directory' or subdirectory
glob.iglob(pathname, *, recursive=False)

In your example, it is possible to write like this:


import glob
import os

configfiles = [f for f in glob.glob("C:/Users/sam/Desktop/*.txt")]

for f in configfiles:
    print(f'Filename with path: {f}')
    print(f'Only filename: {os.path.basename(f)}')
    print(f'Filename without extensions: {os.path.splitext(os.path.basename(f))[0]}')

Output:

Filename with path: C:/Users/sam/Desktop/test_file.txt
Only filename: test_file.txt
Filename without extensions: test_file

Help: Documentation for os.path.splitext and documentation for os.path.basename.

Milovan Tomašević
  • 6,823
  • 1
  • 50
  • 42
0

As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

Note that you can only iterate once over configfiles in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).

f0xdx
  • 1,379
  • 15
  • 20
0

The command rglob will do an infinite recursion down the deepest sub-level of your directory structure. If you only want one level deep, then do not use it, however.

I realize the OP was talking about using glob.glob. I believe this answers the intent, however, which is to search all subfolders recursively.

The rglob function recently produced a 100x increase in speed for a data processing algorithm which was using the folder structure as a fixed assumption for the order of data reading. However, with rglob we were able to do a single scan once through all files at or below a specified parent directory, save their names to a list (over a million files), then use that list to determine which files we needed to open at any point in the future based on the file naming conventions only vs. which folder they were in.

brethvoice
  • 350
  • 1
  • 4
  • 14