8

In Python on a GNU/Linux system, what's the fastest way to recursively scan a directory for all .MOV or .AVI files, and to store them in a list?

Keith Pinson
  • 7,835
  • 7
  • 61
  • 104
ensnare
  • 40,069
  • 64
  • 158
  • 224
  • 1
    Fastest probably involves writing extension to use native code. But do you really want that? – David Heffernan Dec 24 '11 at 17:28
  • Even if you don't want to do that, depending on how many files and directories we're talking about, it might be faster to execute the external `find` command than processing the results of `os.walk()`. But if the `os.walk()` solution is fast enough, it is more elegant and easy to understand/edit. – Michael Hoffman Dec 24 '11 at 17:37

8 Answers8

7

You can use os.walk() for recuresive walking and glob.glob() or fnmatch.filter() for file matching:

Check this answer

Community
  • 1
  • 1
Aleksandra Zalcman
  • 3,352
  • 1
  • 19
  • 19
7

I'd use os.walk to scan the directory, os.path.splitext to grab the suffix and filter them myself.

suffixes = set(['.AVI', '.MOV'])
for dirpath, dirnames, filenames in os.walk('.'):
    for f in filenames:
        if os.path.splitext(f)[1] in suffixes:
            yield os.path.join(dirpath, f)
  • This is probably the best solution because it can be easily adapted to enforce case-insensitive matching. – ekhumoro Dec 24 '11 at 19:49
4

Example for a list of files in current directory. You can expand this for specific paths.

import glob
movlist = glob.glob('*.mov')
milancurcic
  • 6,202
  • 2
  • 34
  • 47
3
pattern = re.compile('.*\.(mov|MOV|avi|mpg)$')

def fileList(source):
   matches = []
   for root, dirnames, filenames in os.walk(source):
       for filename in filter(lambda name:pattern.match(name),filenames):
           matches.append(os.path.join(root, filename))
   return matches
Jhonathan
  • 1,611
  • 2
  • 13
  • 24
  • The [fnmatch](http://docs.python.org/library/fnmatch.html#module-fnmatch) module only supports very simple glob patterns, so your filter won't work. – ekhumoro Dec 24 '11 at 19:46
  • @ekhumoro if it works, symbols ([],.,?, *, ()) are allowed to glob, python test code and see which works – Jhonathan Dec 24 '11 at 20:01
  • Your pattern is equivalent to `*.[movMOVaipg()]`. This will match, for example, `*.i`, `*.a`, `*.M`, etc, but _not_ `*.MOV`, `*.avi`, etc. Try it for youself! – ekhumoro Dec 24 '11 at 20:21
2

From Python 3.12 onwards, it is possible to use Path.walk of the module pathlib.

By using Path objects instead of string representation of paths, this module makes easier to combine paths, and it allow to use the property .suffix

from pathlib import Path


suffixes = set(['.AVI', '.MOV'])
files_with_suffix = list()

for root, dirs, files in Path(".").walk():
    for file in files:
        if file.suffix in suffixes:
            files_with_suffix.append(root / file)
Jundiaius
  • 6,214
  • 3
  • 30
  • 43
2

I suggest the use of os.walk and a carefully reading of its documentation.

This may be a one liner approach:

[f for root,dirs,files in os.walk('/your/path') for f in files if is_video(f)]

Where in is_video you check your extensions.

Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
1

Python 2.x:

import os

def generic_tree_matching(rootdirname, filterfun):
    return [
        os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootdirname)
        for filename in filenames
        if filterfun(filename)]

def matching_ext(rootdirname, extensions):
    "Case sensitive extension matching"
    return generic_tree_matching(
        rootdirname,
        lambda fn: fn.endswith(extensions))

def matching_ext_ci(rootdirname, extensions):
    "Case insensitive extension matching"
    try:
        extensions= extensions.lower()
    except AttributeError: # assume it's a sequence of extensions
        extensions= tuple(
            extension.lower()
            for extension in extensions)
    return generic_tree_matching(
        rootdirname,
        lambda fn: fn.lower().endswith(extensions))

Use either matching_ext or matching_ext_ci with arguments the root folder and an extension or a tuple of extensions:

>>> matching_ext(".", (".mov", ".avi"))
tzot
  • 92,761
  • 29
  • 141
  • 204
0

You can also use pathlib for this.

from pathlib import Path

files_mov = list(Path(path).rglob("*.MOV"))
H. Sánchez
  • 566
  • 6
  • 14