16

I have the code working to read in the values of a single text file but am having difficulties reading all files from all directories and putting all of the contents together.

Here is what I have:

filename = '*'
filesuffix = '*'
location = os.path.join('Test', filename + "." + filesuffix)
Document = filename
thedictionary = {}
with open(location) as f:
 file_contents = f.read().lower().split(' ') # split line on spaces to make a list
 for position, item in enumerate(file_contents): 
     if item in thedictionary:
      thedictionary[item].append(position)
     else:
      thedictionary[item] = [position]
wordlist = (thedictionary, Document)
#print wordlist
#print thedictionary

note that I am trying to stick the wildcard * in for the filename as well as the wildcard for the filesuffix. I get the following error:

"IOError: [Errno 2] No such file or directory: 'Test/.'"

I am not sure if this is even the right way to do it but it seems that if I somehow get the wildcards working - it should work.

I have gotten this example to work: Python - reading files from directory file not found in subdirectory (which is there)

Which is a little different - but don't know how to update it to read all files. I am thinking that in this initial set of code:

previous_dir = os.getcwd()
os.chdir('testfilefolder')
#add something here?
for filename in os.listdir('.'):

That I would need to add something where I have an outer for loop but don't quite know what to put in it..

Any thoughts?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Relative0
  • 1,567
  • 4
  • 17
  • 23

1 Answers1

18

Python doesn't support wildcards directly in filenames to the open() call. You'll need to use the glob module instead to load files from a single level of subdirectories, or use os.walk() to walk an arbitrary directory structure.

Opening all text files in all subdirectories, one level deep:

import glob

for filename in glob.iglob(os.path.join('Test', '*', '*.txt')):
    with open(filename) as f:
        # one file open, handle it, next loop will present you with a new file.

Opening all text files in an arbitrary nesting of directories:

import os
import fnmatch

for dirpath, dirs, files in os.walk('Test'):
    for filename in fnmatch.filter(files, '*.txt'):
        with open(os.path.join(dirpath, filename)):
            # one file open, handle it, next loop will present you with a new file.
jamylak
  • 128,818
  • 30
  • 231
  • 230
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thank you Martijn for that. I will try it out and see what happens. I am curious though to as why they make two different functions glob and the os.walk. On a little reading I do see that glob will let you use wildcards, but os.walk will not - instead you need to filter the results. I don't understand what is really going on as when I am thinking filter the results I thought that is what wildcard expressions did. I found this post: http://stackoverflow.com/questions/8931099/quicker-to-os-walk-or-glob If you have any insight and time, any thoughts are appreciated. – Relative0 Apr 15 '13 at 17:34
  • `glob()` does not support arbitrary nested subdirectories (yet). That's the only difference here. `os.walk()` does but requires more filtering. Note that `glob()` uses the *same filter method* (the `fnmatch` module) already in it's own implementation. – Martijn Pieters Apr 15 '13 at 17:36