34

I have a root-ish directory containing multiple subdirectories, all of which contain a file name data.txt. What I would like to do is write a script that takes in the "root" directory, and then reads through all of the subdirectories and reads every "data.txt" in the subdirectories, and then writes stuff from every data.txt file to an output file.

Here's a snippet of my code:

import os
import sys
rootdir = sys.argv[1]

with open('output.txt','w') as fout:
    for root, subFolders, files in os.walk(rootdir):
        for file in files:
            if (file == 'data.txt'):
                #print file
                with open(file,'r') as fin:
                    for lines in fin:
                        dosomething()

My dosomething() part -- I've tested and confirmed for it to work if I am running that part just for one file. I've also confirmed that if I tell it to print the file instead (the commented out line) the script prints out 'data.txt'.

Right now if I run it Python gives me this error:

File "recursive.py", line 11, in <module>
    with open(file,'r') as fin:
IOError: [Errno 2] No such file or directory: 'data.txt'

I'm not sure why it can't find it -- after all, it prints out data.txt if I uncomment the 'print file' line. What am I doing incorrectly?

Mridang Agarwalla
  • 43,201
  • 71
  • 221
  • 382
Joe
  • 1,378
  • 5
  • 20
  • 32
  • 1
    Just a style comment: once nesting gets this deep, it can be hard to read. To simplify, I'd put the inner part in a separate `def do_file(filename): ...` function. You can also do `if file == 'data.txt': continue` to simplify and save a level there. See also [PEP 20](http://www.python.org/dev/peps/pep-0020/): "Flat is better than nested". – Ben Hoyt Nov 26 '12 at 20:47

2 Answers2

55

You need to use absolute paths, your file variable is just a local filename without a directory path. The root variable is that path:

with open('output.txt','w') as fout:
    for root, subFolders, files in os.walk(rootdir):
        if 'data.txt' in files:
            with open(os.path.join(root, 'data.txt'), 'r') as fin:
                for lines in fin:
                    dosomething()
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 7
    If, like me, anyone reading this wants to additionally filter the filenames being iterated, the answer to this question proved very helpful: http://stackoverflow.com/questions/2186525/use-a-glob-to-find-files-recursively-in-python – BigglesZX Oct 22 '13 at 10:18
  • 2
    [`os.walk()` + follow symlinks](http://stackoverflow.com/questions/3771696/python-os-walk-follow-symlinks) addresses how to have this follow links. – Schorsch Aug 22 '14 at 15:02
0
[os.path.join(dirpath, filename) for dirpath, dirnames, filenames in os.walk(rootdir) 
                                 for filename in filenames]

A functional approach to get the tree looks shorter, cleaner and more Pythonic.

You can wrap the os.path.join(dirpath, filename) into any function to process the files you get or save the array of paths for further processing

Himura
  • 1,686
  • 1
  • 13
  • 19