0

so I want to open each file in a directory (there are 5 plain text documents in this directory ). And do something like find specific words in each file.

This is code I used, but it seems the results are all stacked together instead by each file.

import re
import os

path = 'C:\Python27\projects\Alabama\New folder'


pattern = re.compile(r"\bshall\b")
pattern1 = re.compile(r"\bmay\b")
pattern2 = re.compile(r"\bmust\b")


for filenames in os.listdir(path):
   with open(filenames, 'r') as myfile:
        for string in myfile:
            m = re.findall(pattern, string)
            m1 = re.findall(pattern1, string)
            m2 =  re.findall(pattern2,string)
            k = len(m) 
            k1 = len(m1)
            k2 = len(m2)
            print m,m1,m2,k,k1,k2

my question is that how to perform the re.findall function for every single one of the file in directory separately, instead of having a stacked up output.

Thanks!

CHballer
  • 105
  • 2
  • 11
  • 1
    Are you asking how you should separate the output of your code for different files? – Ali May 10 '16 at 05:51
  • 3
    You cannot open and read word files (`.doc` or `.docx`) in the manner you are trying because these are _binary files_ in a specific format. They are not plain text files that you can simply open and read the lines. – Burhan Khalid May 10 '16 at 05:56
  • @BurhanKhalid is right, you may want to consider using something such as [AntiWord](https://en.wikipedia.org/wiki/Antiword) to get a plain text version of your file or use the native Python docx module to extract all the text from a doc. – Anon957 May 10 '16 at 06:14
  • @BurhanKhalid you are right, so instead of read the lines, is a way to open `doc` or `docx` file and use re.findall function? – CHballer May 10 '16 at 06:15
  • 1
    @TianMa yes python needs to be able to able to properly read your files before it can process them. Secondly to 'separate the output of your code' you could simply use the `myfile` variable to print the file name first then print the result for the corresponding file. Alternatively you could make a dict with filename and results – Anon957 May 10 '16 at 06:23
  • @alivar yes, that's my question, how to separate the out for different files in the folder – CHballer May 10 '16 at 06:24

0 Answers0