233

I need to iterate through the subdirectories of a given directory and search for files. If I get a file I have to open it and change the content and replace it with my own lines.

I tried this:

import os

rootdir ='C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        f=open(file,'r')
        lines=f.readlines()
        f.close()
        f=open(file,'w')
        for line in lines:
            newline = "No you are not"
            f.write(newline)
        f.close()

but I am getting an error. What am I doing wrong?

AvidLearner
  • 4,123
  • 5
  • 35
  • 48
Wolf
  • 3,019
  • 3
  • 17
  • 14
  • 21
    "An error" - any error in particular? – Daniel Roseman Oct 25 '13 at 10:18
  • @DanielRoseman He's not supposed to. Code is right. – Games Brainiac Oct 25 '13 at 10:20
  • 1
    Please could you explain a little about what you hope to do with the files / directories once you get the walk through them working as intended? Also please provide error details. – ChrisProsser Oct 25 '13 at 10:31
  • Basically i have a root directory in which i have some sub directories.In these sub directories i have files. So i have lets say cool.txt file in one of these sub directories, then i want to open this cool.txt file read all the lines and replace all the lines with "No you are not". – Wolf Oct 25 '13 at 10:37
  • 1
    The error message that im getting is that the file cool.txt is not found. In my test folder i have an other folder called src and in the src folder i have another folder called main, in this folder i have cool.txt – Wolf Oct 25 '13 at 10:38
  • 5
    can you just write the error in the question? its beyond annoying and unnecessary to have to read through the comments to find it. – Charlie Parker Jul 16 '16 at 18:36
  • 3
    over a year later I can't believe I'm back requesting for the error to be posted? @Wolf – Charlie Parker Oct 26 '17 at 22:45
  • The question is anwsered etc, but the error code you will get from this is: FileNotFoundError: [Errno 2] No such file or directory: 'file.xml'. As seen in the marked anwser he had to give the absolute path to the file. – Martin Apr 11 '18 at 07:16

3 Answers3

436

The actual walk through the directories works as you have coded it. If you replace the contents of the inner loop with a simple print statement you can see that each file is found:

import os
rootdir = 'C:/Users/sid/Desktop/test'

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print(os.path.join(subdir, file))
szymmirr
  • 27
  • 7
ChrisProsser
  • 12,598
  • 6
  • 35
  • 44
  • 1
    C:/Users/sid/Desktop/test\src\app/cool.txt C:/Users/sid/Desktop/test\src\app/woohoo.txt Ya in the open statement of my code, i think i have to give the absolute path to the file. import os rootdir ='C:/Users/spemmara/Desktop/test/src/app/' for subdir, dirs, files in os.walk(rootdir): for file in files: f=open(subdir+'/'+ file,'r') lines=f.readlines() f.close() f=open(subdir+'/'+file,'w') for line in lines: newline = "hey i know" f.write(newline) f.close() Thanks man. Its solved – Wolf Oct 25 '13 at 10:55
  • Sir, I am getting IsADIrectoryError – Akshat Zala Nov 07 '20 at 11:40
35

Another way of returning all files in subdirectories is to use the pathlib module, introduced in Python 3.4, which provides an object oriented approach to handling filesystem paths (Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi):

from pathlib import Path

rootdir = Path('C:/Users/sid/Desktop/test')
# Return a list of regular files only, not directories
file_list = [f for f in rootdir.glob('**/*') if f.is_file()]

# For absolute paths instead of relative the current dir
file_list = [f for f in rootdir.resolve().glob('**/*') if f.is_file()]

Since Python 3.5, the glob module also supports recursive file finding:

import os
from glob import iglob

rootdir_glob = 'C:/Users/sid/Desktop/test/**/*' # Note the added asterisks
# This will return absolute paths
file_list = [f for f in iglob(rootdir_glob, recursive=True) if os.path.isfile(f)]

The file_list from either of the above approaches can be iterated over without the need for a nested loop:

for f in file_list:
    print(f) # Replace with desired operations
joelostblom
  • 43,590
  • 17
  • 150
  • 159
  • 2
    What is preferable here for Python 3.6? – PhoenixDev Jun 19 '17 at 12:54
  • 1
    @PhoenixDev I haven't heard of one approach being recommended over the other in general. I prefer using `pathlib` myself, mostly because I like the object oriented methods-syntax. There are other differences, such as the path library returns specific path classes rather than strings, and the available functions differ between the libraries (e.g. `os.path.expanduser('~')` vs `Path.home()`). Browse through the documentation and see which approach you prefer. – joelostblom Jun 20 '17 at 13:36
  • 1
    Instead of adding `**` in the glob pattern, you can use [`rglob`](https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob). – Georgy Jan 20 '19 at 11:14
  • In your code, what exactly is `rootdir_glob` for? O.o Is it just a full path example? BTW: short version: `import os, glob` + `file_list = [f for f in glob.iglob('**/*', recursive=True) if os.path.isfile(f)]` – jave.web Feb 06 '21 at 15:53
  • @jave.web Good catch, I believe I meant to use it in the glob so it is equivalent to the first pathlib example. Updated. – joelostblom Feb 06 '21 at 18:10
  • 1
    @joelostblom oh, I suggest also using the `os` for creating crossplatform paths :) - anyways based on this idea of yours I've today created a little tool to count all files in **N** directory trees and their sizes, so you can quickly do some basic integrity pre-check - I've published it on https://ideone.com/4pu1qs - there are some "typos" in the prints() but otherwise it's working as a charm :) – jave.web Feb 06 '21 at 22:23
  • @jave.web Definitely agree, I just used the same as in the question here, otherwise I prefer pathlib's `/` approach. Will check out the link! – joelostblom Feb 06 '21 at 22:52
24

From python >= 3.5 onward, you can use **, glob.iglob(path/**, recursive=True) and it seems the most pythonic solution, i.e.:

import glob, os

for filename in glob.iglob('/pardadox-music/**', recursive=True):
    if os.path.isfile(filename): # filter dirs
        print(filename)

Output:

/pardadox-music/modules/her1.mod
/pardadox-music/modules/her2.mod
...

Notes:

  1. glob.iglob

    glob.iglob(pathname, recursive=False)

    Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

  2. If recursive is True, the pattern '**' will match any files and zero or more directories and subdirectories.

  3. If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:

    >>> import glob
    >>> glob.glob('*.gif') ['card.gif'] 
    >>> glob.glob('.c*')['.card.gif']
    
  4. You can also use rglob(pattern), which is the same as calling glob() with **/ added in front of the given relative pattern.

Neuron
  • 5,141
  • 5
  • 38
  • 59
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268