77

I'm trying to get a list of all log files (.log) in directory, including all subdirectories.

munchybunch
  • 6,033
  • 11
  • 48
  • 62
  • 1
    This has already been asked: http://stackoverflow.com/questions/837606/find-the-oldest-file-recursively-in-a-directory, as well as a few others (search 'Python walk' in the search box) – Eli Bendersky Jun 05 '09 at 07:52

7 Answers7

131
import os
import os.path

for dirpath, dirnames, filenames in os.walk("."):
    for filename in [f for f in filenames if f.endswith(".log")]:
        print os.path.join(dirpath, filename)
  • 2
    If you want to search in a different directory from "." you could pass the direcotry as sys.argv[1] and call os.walk(sys.argv[1]). –  Jun 05 '09 at 07:12
  • 2
    Additional improvement: Use a generator instead of list comprehension: for filename in (f for f ...) –  Jun 05 '09 at 07:17
  • 3
    If you want to exclude a certain directory, e.g., `old_logs`, you can simply remove it from `dirnames` and it won't be searched: `if "old_logs" in dirnames: dirnames.remove("old_logs")` – stefanbschneider Feb 02 '17 at 09:21
  • Any faster method, like using multiprocessing.Pool() or something? – Hzzkygcs Mar 31 '20 at 17:43
  • 1
    Since Python 3 print is a function and must be called like this: `print(os.path.join(dirpath, filename))` – volkit Aug 27 '20 at 13:57
23

You can also use the glob module along with os.walk.

import os
from glob import glob

files = []
start_dir = os.getcwd()
pattern   = "*.log"

for dir,_,_ in os.walk(start_dir):
    files.extend(glob(os.path.join(dir,pattern))) 
Shawn Chin
  • 84,080
  • 19
  • 162
  • 191
  • What do the underscores do in the for-loop? '_' – nu everest Nov 06 '15 at 18:40
  • 2
    @nueverest `os.walk` returns a 3-tuple `(dirpath, dirnames, filenames)` at each iteration, and we're only interested in `dirpath` (assigned to `dir` above); the underscores are just used as placeholders for the other 2 values we're not interested in (i.e. `dirnames`, and then `filenames`, are being assigned to the variable `_`, which we will never use). – tavnab Apr 14 '16 at 23:03
  • Why run `glob` and do extra I/O, when you already have the list of `filenames` which you could filter with [`fnmatch.filter`](https://docs.python.org/3/library/fnmatch.html#fnmatch.filter)? – Cristian Ciupitu Mar 15 '18 at 03:09
  • 1
    This redefines the `dir` function, use `for directory,_,_ ...` instead. – Chris Collett Feb 05 '21 at 19:17
8

Checkout Python Recursive Directory Walker. In short os.listdir() and os.walk() are your friends.

ismail
  • 46,010
  • 9
  • 86
  • 95
5

A single line solution using only (nested) list comprehension:

import os

path_list = [os.path.join(dirpath,filename) for dirpath, _, filenames in os.walk('.') for filename in filenames if filename.endswith('.log')]
Frederik Baetens
  • 781
  • 1
  • 9
  • 20
  • 1
    This "one-liner" is excessive. If you're going over 79 characters (see [PEP 8](https://www.python.org/dev/peps/pep-0008/#maximum-line-length)), it takes away from readability and should either be split into multiple lines or made into a function (preferred). – Chris Collett Feb 05 '21 at 19:23
  • That's true, I posted this mostly for the simplicity & list comprehension. It's indeed nice to split this over multiple lines. – Frederik Baetens Feb 06 '21 at 19:30
2

I have a solution:

import os
for logfile in os.popen('find . -type f -name *.log').read().split('\n')[0:-1]:
      print logfile

or

import subprocess
(out, err) = subprocess.Popen(["find", ".", "-type", "f", "-name", "*.log"], stdout=subprocess.PIPE).communicate()
for logfile in out.split('\n')[0:-1]:
  print logfile

These two take the advantage of find . -type f -name *.log.

The first one is simpler but not guaranteed for white-space when add -name *.log, but worked fine for simply find ../testdata -type f (in my OS X environment).

The second one using subprocess seems more complicated, but this is the white-space safe one (again, in my OS X environment).

This is inspired by Chris Bunch, in the answer https://stackoverflow.com/a/3503909/2834102

paracosmo
  • 429
  • 4
  • 3
1

Using standard library's pathlib:

from pathlib import Path

working_dir = Path()
for path in working_dir.glob("**/*.log"):
    print(path)
    # OR if you need absolute paths
    print(path.absolute())
    # OR if you need only filenames without extension for further parsing
    print(path.stem)
pkubik
  • 780
  • 6
  • 19
-1

If You want to list in current directory, You can use something like:

import os

for e in os.walk(os.getcwd()):
    print e

Just change the

os.getcwd()

to other path to get results there.

praavDa
  • 419
  • 2
  • 9
  • 18
  • 2
    This answer doesn't address the OP's question and isn't relevant to most people who would be seeking the same answer. – Andrew Aug 08 '17 at 19:40