39

I have a folder called notes, naturally they will be categorized into folders, and within those folders there will also be sub-folders for sub categories. Now my problem is I have a function that walks through 3 levels of sub directories:

def obtainFiles(path):
      list_of_files = {}
      for element in os.listdir(path):
          # if the element is an html file then..
          if element[-5:] == ".html":
              list_of_files[element] = path + "/" + element
          else: # element is a folder therefore a category
              category = os.path.join(path, element)
              # go through the category dir
              for element_2 in os.listdir(category):
                  dir_level_2 = os.path.join(path,element + "/" + element_2)
                  if element_2[-5:] == ".html":
                      print "- found file: " + element_2
                      # add the file to the list of files
                      list_of_files[element_2] = dir_level_2
                  elif os.path.isdir(element_2):
                      subcategory = dir_level_2
                      # go through the subcategory dir
                      for element_3 in os.listdir(subcategory):
                          subcategory_path = subcategory + "/" + element_3
                        if subcategory_path[-5:] == ".html":
                            print "- found file: " + element_3
                            list_of_files[element_3] = subcategory_path
                        else:
                            for element_4 in os.listdir(subcategory_path):
                                 print "- found file:" + element_4

Note that this is still very much a work in progress. Its very ugly in my eyes... What I am trying to achieve here is to go through all the folders and sub folders down and put all the file names in a dictionary called "list_of_files", the name as "key", and the full path as "value". The function doesn't quite work just yet, but was wondering how would one use the os.walk function to do a similar thing?

Thanks

chutsu
  • 13,612
  • 19
  • 65
  • 86
  • 2
    possible duplicate of [Directory listing in Python](http://stackoverflow.com/questions/120656/directory-listing-in-python) – kennytm May 27 '10 at 16:11
  • 8
    In order to answer this question, you must first understand recursion. See also: http://stackoverflow.com/questions/2922783/how-do-you-walk-through-the-directories-using-python – Daniel Pryden May 27 '10 at 16:15

4 Answers4

82

Based on your short descriptions, something like this should work:

list_of_files = {}
for (dirpath, dirnames, filenames) in os.walk(path):
    for filename in filenames:
        if filename.endswith('.html'): 
            list_of_files[filename] = os.sep.join([dirpath, filename])
ig0774
  • 39,669
  • 3
  • 55
  • 57
11

an alternative is to use generator, building on @ig0774's code

import os
def walk_through_files(path, file_extension='.html'):
   for (dirpath, dirnames, filenames) in os.walk(path):
      for filename in filenames:
         if filename.endswith(file_extension): 
            yield os.path.join(dirpath, filename)

and then

for fname in walk_through_files():
    print(fname)
muon
  • 12,821
  • 11
  • 69
  • 88
3

I've come across this question multiple times, and none of the answers satisfy me - so created a script for that. Python is very cumbersome to use when it comes to walking through directories.

Here's how it can be used:

import file_walker


for f in file_walker.walk("/a/path"):
     print(f.name, f.full_path) # Name is without extension
     if f.isDirectory: # Check if object is directory
         for sub_f in f.walk(): # Easily walk on new levels
             if sub_f.isFile: # Check if object is file (= !isDirectory)
                 print(sub_f.extension) # Print file extension
                 with sub_f.open("r") as open_f: # Easily open file
                     print(open_f.read())
                
            
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Nearoo
  • 4,454
  • 3
  • 28
  • 39
1

You could do this:

list_of_files = dict([ (file, os.sep.join((dir, file)))
                       for (dir,dirs,files) in os.walk(path)
                       for file in files
                       if file[-5:] == '.html' ])
psmears
  • 26,070
  • 4
  • 40
  • 48