29

I need to specify multiple file extensions like pathlib.Path(temp_folder).glob('*.xls', '*.txt'):

How I can do it?

https://docs.python.org/dev/library/pathlib.html#pathlib.Path.glob

Dmitry Bubnenkov
  • 9,415
  • 19
  • 85
  • 145

5 Answers5

26

A bit late to the party with a couple of single-line suggestions that don't require writing a custom function nor the use of a loop and work on Linux:

pathlib.Path.glob() takes interleaved symbols in brackets. For the case of ".txt" and ".xls" suffixes, one could write

files = pathlib.Path('temp_dir').glob('*.[tx][xl][ts]')

If you need to search for ".xlsx" as well, just append the wildcard "*" after the last closing bracket.

files = pathlib.Path('temp_dir').glob('*.[tx][xl][ts]*')

A thing to keep in mind is that the wildcard at the end will be catching not only the "x", but any trailing characters after the last "t" or "s".

Prepending the search pattern with "**/" will do the recursive search as discussed in previous answers.

dnt2s
  • 403
  • 4
  • 3
  • 12
    A caveat with the one-liner solution is that this will also match files with extensions .txs, .tlt, .tls, .xxs and .xlt. – Rajan Ponnappan Jun 05 '21 at 23:56
  • 4
    This doesn't seem to work for me on OS X, Python 3.10, no matches – fmalina May 09 '22 at 12:13
  • Same on Ubuntu 22.04LTS with Python 3.10. Doesn't work for `Path(my_directory).glob("*.[sql][SQL]")` but interestingly it works with `Path(my_directory).glob("*[sSqQlL]")`. – kerfuffle Oct 06 '22 at 15:27
  • But as @RajanPonnappan said: it will also work for things like ".ssl", haha – kerfuffle Oct 06 '22 at 15:47
  • @Rajan Ponnappan, Jun 5, 2021 at 23:56: This was somewhat hinted in the second to last sentence in the answer above. Thank you for making it explicit. – dnt2s Aug 04 '23 at 07:48
  • @kerfuffle, Oct 6, 2022 at 15:27: Both of your attempts differ than the suggested solution in the answer. – dnt2s Aug 04 '23 at 07:53
  • @dnt2s Help me out here: How does `Path(my_directory).glob("*.[sql][SQL]")` differ from one of your solutions: `Path('temp_dir').glob('*.[tx][xl][ts]')` looks not a bit different semantically. – kerfuffle Aug 09 '23 at 08:36
  • @kerfuffle As a simple test put the files a.sql, a.SQL, a.sS, a.qQ and a.lL in a directory. Run "ls *.[sS][qQ][lL]" in that directory and you will see "a.sql a.SQL" (Note the missing "a.sS a.qQ a.lL"). If you then run "ls *.[sql][SQL]" you will see "a.sS a.qQ a.lL" (Note the missing "a.sql a.SQL"). That's how Path.glob() interprets the different letter combinations in the brackets – dnt2s Aug 12 '23 at 05:34
20

If you need to use pathlib.Path.glob()

from pathlib import Path
def get_files(extensions):
    all_files = []
    for ext in extensions:
        all_files.extend(Path('.').glob(ext))
    return all_files

files = get_files(('*.txt', '*.py', '*.cfg'))
phoenix
  • 7,988
  • 6
  • 39
  • 45
mattjvincent
  • 868
  • 2
  • 11
  • 28
9

You can also use the syntax ** from pathlib which allows you to recursively collect the nested paths.

from pathlib import Path
import re


BASE_DIR = Path('.')
EXTENSIONS = {'.xls', '.txt'}

for path in BASE_DIR.glob(r'**/*'):
    if path.suffix in EXTENSIONS:
        print(path)

If you want to express more logic in your search you can also use a regex as follows:

pattern_sample = re.compile(r'/(([^/]+/)+)(S(\d+)_\d+).(tif|JPG)')

This pattern will look for all images (tif and JPG) that match S327_008(_flipped)?.tif in my case. Specifically it will collect the sample id and the file name.

Collecting into a set prevents storing duplicates, I found it sometimes useful if you insert more logic and want to ignore different versions of the files (_flipped)

matched_images = set()

for item in BASE_DIR.glob(r'**/*'):
    match = re.match(pattern=pattern_sample, string=str(item))
    if match:
        # retrieve the groups of interest
        filename, sample_id = match.group(3, 4)
        matched_images.add((filename, int(sample_id)))
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
leoburgy
  • 356
  • 3
  • 9
  • I like this one better since you only have to walk the directory tree once instead of n * len(extensions). Thanks for posting this. – nicholishen Apr 19 '20 at 03:34
2

Suppose that the following folder structure is prepared.

folder
├── test1.png
├── test1.txt
├── test1.xls
├── test2.png
├── test2.txt
└── test2.xls

The simple answer using pathlib.Path is as follows.

from pathlib import Path

ext = ['.txt', '.xls']
folder = Path('./folder')

# Get a list of pathlib.PosixPath
path_list = sorted(filter(lambda path: path.suffix in ext, folder.glob('*')))
print(path_list)
# [PosixPath('folder/test1.txt'), PosixPath('folder/test1.xls'), PosixPath('folder/test2.txt'), PosixPath('folder/test2.xls')]

If you want to get the path as a list of strings, you can convert it to a string by using .as_posix().

# Get a list of string paths
path_list = sorted([path.as_posix() for path in filter(lambda path: path.suffix in ext, folder.glob('*'))])
print(path_list)
# ['folder/test1.txt', 'folder/test1.xls', 'folder/test2.txt', 'folder/test2.xls']
Keiku
  • 8,205
  • 4
  • 41
  • 44
1

A four-liner solution based on Check if string ends with one of the strings from a list:

folder = '.'
suffixes = ('xls', 'txt')
filter_function = lambda x: x.endswith(suffixes)
list(filter(filter_function, glob(os.path.join(folder, '*'))))
itamar kanter
  • 1,170
  • 3
  • 10
  • 25