In Python on a GNU/Linux system, what's the fastest way to recursively scan a directory for all .MOV
or .AVI
files, and to store them in a list?

- 7,835
- 7
- 61
- 104

- 40,069
- 64
- 158
- 224
-
1Fastest probably involves writing extension to use native code. But do you really want that? – David Heffernan Dec 24 '11 at 17:28
-
Even if you don't want to do that, depending on how many files and directories we're talking about, it might be faster to execute the external `find` command than processing the results of `os.walk()`. But if the `os.walk()` solution is fast enough, it is more elegant and easy to understand/edit. – Michael Hoffman Dec 24 '11 at 17:37
8 Answers
You can use os.walk() for recuresive walking and glob.glob() or fnmatch.filter() for file matching:
Check this answer

- 1
- 1

- 3,352
- 1
- 19
- 19
I'd use os.walk to scan the directory, os.path.splitext to grab the suffix and filter them myself.
suffixes = set(['.AVI', '.MOV'])
for dirpath, dirnames, filenames in os.walk('.'):
for f in filenames:
if os.path.splitext(f)[1] in suffixes:
yield os.path.join(dirpath, f)
-
This is probably the best solution because it can be easily adapted to enforce case-insensitive matching. – ekhumoro Dec 24 '11 at 19:49
Example for a list of files in current directory. You can expand this for specific paths.
import glob
movlist = glob.glob('*.mov')

- 6,202
- 2
- 34
- 47
pattern = re.compile('.*\.(mov|MOV|avi|mpg)$')
def fileList(source):
matches = []
for root, dirnames, filenames in os.walk(source):
for filename in filter(lambda name:pattern.match(name),filenames):
matches.append(os.path.join(root, filename))
return matches

- 1,611
- 2
- 13
- 24
-
The [fnmatch](http://docs.python.org/library/fnmatch.html#module-fnmatch) module only supports very simple glob patterns, so your filter won't work. – ekhumoro Dec 24 '11 at 19:46
-
@ekhumoro if it works, symbols ([],.,?, *, ()) are allowed to glob, python test code and see which works – Jhonathan Dec 24 '11 at 20:01
-
Your pattern is equivalent to `*.[movMOVaipg()]`. This will match, for example, `*.i`, `*.a`, `*.M`, etc, but _not_ `*.MOV`, `*.avi`, etc. Try it for youself! – ekhumoro Dec 24 '11 at 20:21
From Python 3.12 onwards, it is possible to use Path.walk
of the module pathlib
.
By using Path
objects instead of string representation of paths, this module makes easier to combine paths, and it allow to use the property .suffix
from pathlib import Path
suffixes = set(['.AVI', '.MOV'])
files_with_suffix = list()
for root, dirs, files in Path(".").walk():
for file in files:
if file.suffix in suffixes:
files_with_suffix.append(root / file)

- 6,214
- 3
- 30
- 43
I suggest the use of os.walk
and a carefully reading of its documentation.
This may be a one liner approach:
[f for root,dirs,files in os.walk('/your/path') for f in files if is_video(f)]
Where in is_video
you check your extensions.

- 28,332
- 6
- 65
- 82
Python 2.x:
import os
def generic_tree_matching(rootdirname, filterfun):
return [
os.path.join(dirname, filename)
for dirname, dirnames, filenames in os.walk(rootdirname)
for filename in filenames
if filterfun(filename)]
def matching_ext(rootdirname, extensions):
"Case sensitive extension matching"
return generic_tree_matching(
rootdirname,
lambda fn: fn.endswith(extensions))
def matching_ext_ci(rootdirname, extensions):
"Case insensitive extension matching"
try:
extensions= extensions.lower()
except AttributeError: # assume it's a sequence of extensions
extensions= tuple(
extension.lower()
for extension in extensions)
return generic_tree_matching(
rootdirname,
lambda fn: fn.lower().endswith(extensions))
Use either matching_ext
or matching_ext_ci
with arguments the root folder and an extension or a tuple of extensions:
>>> matching_ext(".", (".mov", ".avi"))

- 92,761
- 29
- 141
- 204
You can also use pathlib
for this.
from pathlib import Path
files_mov = list(Path(path).rglob("*.MOV"))

- 566
- 6
- 14