4

I have a folder with many text files (EPA10.txt, EPA55.txt, EPA120.txt..., EPA150.txt). I have 2 strings that are to be searched in each file and the result of the search is written in a text file result.txt. So far I have it working for a single file. Here is the working code:

if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open('C:\\Temp\\lamip\\EPA150.txt').read():
    with open("C:\\Temp\\lamip\\result.txt", "w") as f:
        f.write('Current MW in node is EPA150')
else:
    with open("C:\\Temp\\lamip\\result.txt", "w") as f:
        f.write('NOT EPA150')

Now I want this to be repeated for all the text files in the folder. Please help.

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
slyclam
  • 290
  • 1
  • 9
  • 23

3 Answers3

8

Given that you have some amount of files named from EPA1.txt to EPA150.txt, but you don't know all the names, you can put them all together inside a folder, then read all the files in that folder using the os.listdir() method to get a list of filenames. You can read the file names using listdir("C:/Temp/lamip").

Also, your if statement is wrong, you should do this instead:

text = file.read()
if "string1" in text and "string2" in text

Here's the code:

from os import listdir

with open("C:/Temp/lamip/result.txt", "w") as f:
    for filename in listdir("C:/Temp/lamip"):
        with open('C:/Temp/lamip/' + filename) as currentFile:
            text = currentFile.read()
            if ('LZY_201_335_R10A01' in text) and ('LZY_201_186_R5U01' in text):
                f.write('Current MW in node is ' + filename[:-4] + '\n')
            else:
                f.write('NOT ' + filename[:-4] + '\n')

PS: You can use / instead of \\ in your paths, Python automatically converts them for you.

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • The file numberings are not sequential i.e not EPA1. txt.......EPA150.txt. They are random but the highest is 150. Will this code work still? – slyclam Jan 16 '15 at 10:31
  • `with open('C:/Temp/lamip/MyFolder/' + filename).read() as currentFile` is still wrong. – Ashwini Chaudhary Jan 16 '15 at 11:16
  • My script is in `C:\Python27\Scripts` and the folder containing the text files is `C:\Temp\lamip`. This folder also has the result.txt file. What modifications should I do? – slyclam Jan 16 '15 at 11:24
  • Traceback (most recent call last): File "C:\Temp\lamip.py", line 41, in f.writeline('NOT ' + filename[:-4]) AttributeError: 'file' object has no attribute 'writeline' – slyclam Jan 19 '15 at 06:42
  • @slyclam oh I forgot that you can't use writeline, I edited my answer: just use .write and add a \n character at the end – Marco Bonelli Jan 19 '15 at 07:09
  • It is working but only for a single string i.e only if I do this `if 'LZY_201_335_R10A01' in currentFile.read():## and 'LZY_201_186_R5U01' in currentFile.read():` – slyclam Jan 19 '15 at 07:23
  • @slyclam what do you mean? – Marco Bonelli Jan 19 '15 at 07:30
  • Your code is executing without errors. But, I am getting `NOT ` only. Although the strings are present in one of the files. When I put only one string (for ex: LZY_201_335_R10A01) then I get the desired output. But when I put 2 strings like in your code, then I am getting NOT EPA150. – slyclam Jan 19 '15 at 07:34
  • @slyclam that's not my error. Does your EPA150 contain **both** strings? Maybe you wanted to check if the text contained **at least** one of them? If so you have to use **or** instead of **and**. – Marco Bonelli Jan 19 '15 at 07:40
  • the EPA150.txt has both the strings. But they are not in same line. – slyclam Jan 19 '15 at 07:42
  • @slyclam maybe it was because of the .read() method being called two times. I edited the code assigning the text to a temporary variable to check. See if it works... – Marco Bonelli Jan 19 '15 at 07:47
3

Modularise! Modularise!

Well, not in the terms of having to write distinct Python modules, but isolate the different tasks at hand.

  1. Find the files you wish to search.
  2. Read the file and locate the text.
  3. Write the result into a separate file.

Each of these tasks can be solved independently. I.e. to list the files, you have os.listdir which you might want to filter.

For step 2, it does not matter whether you have 1 or 1,000 files to search. The routine is the same. You merely have to iterate over each file found in step 1. This indicates that step 2 could be implemented as a function that takes the filename (and possible search-string) as argument, and returns True or False.

Step 3 is the combination of each element from step 1 and the result of step 2.

The result:

files = [fn for fn in os.listdir('C:/Temp/lamip') if fn.endswith('.txt')]
# perhaps filter `files`

def does_fn_contain_string(filename):
  with open('C:/Temp/lamip/' + filename) as blargh:
    content = blargh.read()
    return 'string1' in content and/or 'string2' in content

with open('results.txt', 'w') as output:
  for fn in files:
    if does_fn_contain_string(fn):
      output.write('Current MW in node is {1}\n'.format(fn[:-4]))
    else:
      output.write('NOT {1}\n'.format(fn[:-4]))
MrGumble
  • 5,631
  • 1
  • 18
  • 33
  • Traceback (most recent call last): File "C:\Python27\Scripts\lamip.py", line 42, in if does_fn_contain_string(fn): File "C:\Python27\Scripts\lamip.py", line 37, in does_fn_contain_string with open('C:/Temp/lamip/' + filename).read() as content: AttributeError: __exit__ – slyclam Jan 16 '15 at 11:52
  • Sorry. Thought we could use a ``open(...).read()`` in the context manager (the ``with``-construct). It appears we cannot. – MrGumble Jan 16 '15 at 12:23
  • Please help in solving the problem. – slyclam Jan 16 '15 at 12:28
  • Traceback (most recent call last): File "C:\Temp\lamip.py", line 47, in output.write('NOT {1}\n'.format(fn[:-4])) IndexError: tuple index out of range – slyclam Jan 19 '15 at 06:51
  • I'll leave that one for you to debug, as it is an trivial but common error. Hint: it's on the "fn[:-4]" part, so look at which values "fn" is taking. – MrGumble Jan 19 '15 at 08:40
0

You can do this by creating a for loop that runs through all your .txt files in the current working directory.

import os

with open("result.txt", "w") as resultfile:
    for result in [txt for txt in os.listdir(os.getcwd()) if txt.endswith(".txt")]:
        if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open(result).read():
             resultfile.write('Current MW in node is {1}'.format(result[:-4]))
         else:
             resultfile.write('NOT {0}'.format(result[:-4]))
Tarjei
  • 88
  • 1
  • 7
  • I am getting incorrect syntax. There seems to be an error for the **:** in the for loop end. `for result in [txt for txt in os.listdir(os.getwcd()) if txt.endswith(".txt"):` – slyclam Jan 16 '15 at 10:40
  • @slyclam it was lacking the closing bracket for the list comprehension. I've edited the answer. – Tarjei Jan 16 '15 at 10:42
  • Traceback (most recent call last): File "C:\Python27\Scripts\lamip.py", line 36, in for result in [txt for txt in os.listdir(os.getwcd()) if txt.endswith(".txt")]: AttributeError: 'module' object has no attribute 'getwcd' – slyclam Jan 16 '15 at 10:47
  • @slyclam Apologies for all the typos, it's supposed to be `os.getcwd()` as in "get current workind directory". Hope it works now. Good luck! – Tarjei Jan 16 '15 at 10:49
  • Nope. Dosen't work. Nothing gets written in the result.txt file. Please help. – slyclam Jan 16 '15 at 11:33
  • @slyclam, this is getting embarrassing for me, set the `"w"` attribute for the `result.txt` file. – Tarjei Jan 19 '15 at 06:33
  • Traceback (most recent call last): File "C:\Temp\lamip.py", line 41, in resultfile.write('NOT {1}'.format(result[:-4])) IndexError: tuple index out of range – slyclam Jan 19 '15 at 06:41
  • @slyclam ... and Python indexes from zero – Tarjei Jan 19 '15 at 13:29