How to take input of a directory

Question

What I'm trying to do is troll through a directory of log files which begin like this "filename001.log" there can be 100s of files in a directory

The code I want to run against each files is to check to make sure that the 8th position of the log always contains a number. I have a suspicion that a non-digit is throwing off our parser. Here's some simple code I'm trying to check this:

# import re
from urlparse import urlparse

a = '/folderA/filename*.log' #<< currently this only does 1 file
b = '/folderB/' #<< I'd like it to write the same file name as it read
with open(b, 'w') as newfile, open(a, 'r') as oldfile:
    data = oldfile.readlines()
    for line in data:
        parts = line.split()
        status = parts[8]  # value of 8th position in the log file
        isDigit = status.isdigit()

        if isDigit = False:
                print " Not A Number :",status
                newfile.write(status)

My problem is:

How do I tell it to read all the files in a directory? (The above really only works for 1 file at a time)
If I find something is not a number I would like to write that character into a file in a different folder but of the same name as the log file. For example I find filename002.log has a "*" in one of the log lines. I would like folderB/filename002.log to made and the non-digit character to the written.

Sounds sounds simple enough I'm just a not very good at coding.

Maybe this help You will get list of all files of current directory then read one by one at the time. http://stackoverflow.com/questions/3207219/how-to-list-all-files-of-a-directory-in-python — giaosudau, Jan 09 '16 at 04:48

ShadowRanger · Accepted Answer · 2016-01-09T05:02:20.520

To read files in one directory matching a given pattern and write to another, use the glob module and the os.path functions to construct the output files:

srcpat = '/folderA/filename*.log'
dstdir = '/folderB'
for srcfile in glob.iglob(srcpat):
   if not os.path.isfile(srcfile): continue

   dstfile = os.path.join(dstdir, os.path.basename(srcfile))
   with open(srcfile) as src, open(dstfile, 'w') as dst:
       for line in src:
           parts = line.split()
           status = parts[8]  # value of 8th position in the log file
           if not status.isdigit():
               print " Not A Number :", status
               dst.write(status)  # Or print >>dst, status if you want newline

This will create empty files even if no bad entries are found. You can either wait until you're finished processing the files (and the with block is closed) and just check the file size for the output and delete it if empty then, or you can move to a lazy approach where you delete the output file before beginning iteration unconditionally, but don't open it; only if you get a bad value do you open the file (for append instead of write to keep earlier loops' output from being discarded), write to it, allow it to close.

Diane M · Answer 2 · 2016-01-09T05:33:46.457

1

Import os and use: for filenames in os.listdir('path'):. This will list all files in the directory, including subdirectories.
Simply open a second file with the correct path. Since you already have filename from iterating with the above method, you only have to replace the directory. You can use os.path.join for that.

edited Jan 09 '16 at 05:33

answered Jan 09 '16 at 04:55

Diane M

1,503
1
12
23

Why bother with `os.walk` when they only need files in a single directory, not the whole tree? And why use append mode when they're producing a single output file for each input file (if a single shared output file was being used and reopened, sure, but otherwise?). – ShadowRanger Jan 09 '16 at 04:58

How to take input of a directory

2 Answers2