-2

I have a Python log file (let's call it logfile.log) with a bunch of Python errors. There's one particular error being caused by several different methods (let's call it blah blah error). The entries look something like this:

CRITICAL - Unexpected Error: blah blah error
Traceback (most recent call last):
  File "example1.py", line 100, in method1
  File "example2.py", line 200, in method2
  File "example3.py", line 300, in method3
pythonerror.Error: blah blah error

In the sample above, the source of the error - method3 - can be several different methods. I want to go through the entire log file and count how many times each method appears in one of these errors, if it appears at all. Is this possible using regex? What would I need to do to accomplish this?

NOTE: The log file doesn't only contain this particular error, so the method might appear in other errors. I want to get the count of it only if it's within that particular error and only if it's the source of the error (second to last line in the above example).

User
  • 99
  • 1
  • 2
  • 7

2 Answers2

0

You can do that with a regex:

import re

regex = re.compile(
    (r"^.*line \d+, in (?P<method>.*)$\n^pythonerror\.Error: blah blah error$"),
    re.MULTILINE,
)

test_str = """
CRITICAL - Unexpected Error: blah blah error
Traceback (most recent call last):
    File "example1.py", line 100, in method1
    File "example2.py", line 200, in method2
    File "example3.py", line 300, in method3
pythonerror.Error: blah blah error

CRITICAL - Unexpected Error: blah blah error
Traceback (most recent call last):
    File "example2.py", line 200, in method2
    File "example3.py", line 300, in other_method
pythonerror.Error: blah blah error

CRITICAL - Unexpected Error: blah blah error
Traceback (most recent call last):
    File "example3.py", line 300, in other_method
pythonerror.Error: blah blah error
"""

for match in regex.finditer(test_str):
    print(match.groupdict()["method"])
tfeldmann
  • 3,108
  • 1
  • 23
  • 34
0

I hope I've understood you right. This script will count errors for each method. The error must be defined (blah blah error in this case) (regex101):

data = '''CRITICAL - Unexpected Error: blah blah error
Traceback (most recent call last):
  File "example1.py", line 100, in method1
  File "example2.py", line 200, in method2
  File "example3.py", line 300, in method3
pythonerror.Error: blah blah error

CRITICAL - Unexpected Error: blah blah error
Traceback (most recent call last):
  File "example1.py", line 100, in method1
  File "example3.py", line 300, in method3
  File "example2.py", line 200, in method2
pythonerror.Error: blah blah error

CRITICAL - Unexpected Error: other error
Traceback (most recent call last):
  File "example1.py", line 100, in method1
  File "example2.py", line 200, in method2
  File "example3.py", line 300, in method3
pythonerror.Error: other error

CRITICAL - Unexpected Error: blah blah error
Traceback (most recent call last):
  File "example1.py", line 100, in method1
  File "example2.py", line 200, in method2
  File "example3.py", line 300, in method3
pythonerror.Error: blah blah error
'''

import re

error = 'blah blah error'

m = re.findall(r'CRITICAL - Unexpected Error: {error}(?:(?!pythonerror.Error: {error}).)*(?:\b([^\s]+)\npythonerror\.Error: {error})'.format(error=re.escape(error)), data, flags=re.DOTALL)

method_error_count = {}
for g in m:
    method_error_count[g] = method_error_count.get(g, 0) + 1

print(method_error_count)

Prints:

{'method3': 2, 'method2': 1}
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    perfect, would be great if you could explain how this works too – User Aug 20 '19 at 17:50
  • @User It's using regex technique called _Tempered greedy token_ https://stackoverflow.com/questions/30900794/tempered-greedy-token-what-is-different-about-placing-the-dot-before-the-negat Basically I negate all error sections from string that don't dont' have selected error inside. – Andrej Kesely Aug 20 '19 at 17:52