1

I'd like to count the number of occurrence of multiple patterns from each line in a file. so if any of patterns found from line, I'd like to increase the counter by 1 so in the end, I can come up with the total number of lines that includes the patterns that I defined. but I'm stuck to search multiple pattern in a single line but increase count by 1 if found any patterns. can anyone advise on this? and can I make single pattern that covers three patterns I defined?

def sample_output(input_file):
    lines_detect_pattern = 0
    lines_not_detect_pattern = 0

    patterns =['HELLO\(L[0-9]\)\:\[APP*?\]',
               'HELLO\(L[0-9]\)\:\[Unknown\]\[APP.*?\]',
               'HELLO\(L[0-9]\)\:\[Known\]\[APP.*?\]',
              ]

    myfile = open(input_file, 'r')
    outfile = open(final_file,'a+')
    for line in myfile:
        for pattern in patterns:
            if pattern.search(line)

    outfile.write("Total number of system passed PMEM : %s \n" %pmem_pass)
    outfile.write("Total number of system failed PMEM : %s \n" %pmem_fail)

    outfile.close()    
    myfile.close()
Joohun Lee
  • 187
  • 2
  • 14

3 Answers3

1

As soon as pattern.search(line) is successful, you should increase a count and immediately break out of the inner loop.

So, something along the lines of:

if pattern.search(lines) 
 count += 1
 break

should do the work.

Edit:

Concerning the other question, since patterns are fairly similar you CAN include them all into one using pipe operator. I think this would work but try it for yourself:

HELLO\(L[0-9]\)\:\([Unknown\]|[Known\]\)?[APP*?\]

If this (or a variant of this) works, you could remove the inner loop completely :)

There is also a great graphical tool to help you deal with semi-complex regular expressions called Debuggex, and a tool to test your expression online on Regex101.

Mirza
  • 213
  • 2
  • 8
1

First of all you can't use pattern.search(line). pattern is a string and has no search method.
You need to use re.search or re.compile (and then search method on compiled regex). It doesn't make any difference for your code. As docs say:

Note: The compiled versions of the most recent patterns passed to re.match(), re.search() or re.compile() are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.

Like this:

import re
pattern = 'HELLO\(L[0-9]\)\:(?:\[(?:Unk|K)nown\])?\[APP.*?\]'

...

for line in myfile:
    if re.search(pattern, line):
        lines_detect_pattern = lines_detect_pattern + 1
    else:
        lines_not_detect_pattern = lines_not_detect_pattern + 1

For opening files you could use with statement. Read about it in this answer or in docs.

Community
  • 1
  • 1
Tithen-Firion
  • 567
  • 5
  • 17
0

You can use any with your original code which will short circuit on any match, you also need to actually use re, if re.search(pattern,line):

sm = sum(1 for line in myfile if any(re.search(pattern, line) for pattern in patterns))

You could just compile the patterns first:

 r = re.compile("|".join(patterns))

sm = sum(1 for line in myfile if r.search(line))

A simple example:

patterns = ["\d+", "foo"]
import re

r = re.compile("|".join(patterns))
myfile = ["23", "foob", "bar", "hello world"]
sm = sum(1 for line in myfile if r.search(line))
print(sm)

Output:

  2
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • this is what I was looking for.. thanks for letting me know the use of "any". I was looking a way to find all patterns from single line at one trial so I don't increase the counter whenever if find one matches to pattern. – Joohun Lee Sep 04 '15 at 00:45
  • Do you want to match all patterns for the count to increase or any? – Padraic Cunningham Sep 04 '15 at 09:42