0

I'm trying to parse some log file, which each line starts with time stamp, such as:

[11/16/18 16:40:04:097 EST]

If there isn't any error in the log, then every line will have the same starting pattern. However, if some error occurs, then the whole error stack will be printed with the time stamp as following:

[11/16/18 16:40:04:100 EST] 000000ae CommerceSrvr  E MessagingViewCommandImpl nonHttpForwardDocument(String,String) CMN8014E: The URL constructed during composition using ViewName 
Additional Data: 
    null
Current exception:
Message:
_ERR_BSAFE_FUNCTION
Stack trace:

What I want to do is extra the whole error stack, for example, the input is:

[11/16/18 16:40:04:098 EST] 000000ae CommandLogger 2   PerfLog <entry operation="Command : com.ibm.commerce.messaging.viewcommands.MessagingViewCommandImpl" parameters="@releaseID=9.0 
[11/16/18 16:40:04:100 EST] 000000ae CommerceSrvr  E MessagingViewCommandImpl nonHttpForwardDocument(String,String) CMN8014E: The URL constructed during composition using ViewName 
Additional Data: 
    null
Current exception:
Message:
_ERR_BSAFE_FUNCTION
Stack trace:
[11/16/18 16:40:04:101 EST] 000000ae SystemErr     R   
[11/16/18 16:40:04:102 EST] 000000ae SystemErr     R   com.ibm.commerce.exception.ECSystemException: The URL constructed during composition using ViewName http://localhost:80/webapp/wcs/stores/IBM.WC.Compose/webservices/OAGIS/9.0/BODs/AcknowledgePaymentInstruction.jsp/******** is invalid {1}.
    at com.ibm.commerce.messaging.viewcommands.MessagingViewCommandImpl.nonHttpForwardDocument(MessagingViewCommandImpl.java:581)

The ideal out put should be:

[11/16/18 16:40:04:100 EST] 000000ae CommerceSrvr  E MessagingViewCommandImpl nonHttpForwardDocument(String,String) CMN8014E: The URL constructed during composition using ViewName 
Additional Data: 
    null
Current exception:
Message:
_ERR_BSAFE_FUNCTION
Stack trace: 
[11/16/18 16:40:04:102 EST] 000000ae SystemErr     R   com.ibm.commerce.exception.ECSystemException: The URL constructed during composition using ViewName http://localhost:80/webapp/wcs/stores/IBM.WC.Compose/webservices/OAGIS/9.0/BODs/AcknowledgePaymentInstruction.jsp/******** is invalid {1}.
    at com.ibm.commerce.messaging.viewcommands.MessagingViewCommandImpl.nonHttpForwardDocument(MessagingViewCommandImpl.java:581)

Tried the following and failed, if you can let me know what's wrong with my code, it would be great.

import re, sys

if len(sys.argv) > 1:
    with open(sys.argv[1]) as f:
        text = f.read()
else:
    text = sys.stdin.read()

p_start = r'^\[\d{2}/.*'
p_end = r'^\[\d{2}/.*'


pattern = r'{p0}(?!.*{p0})(?:.*?{p1}|.*)'.format(p0=p_start, p1=p_end)

error_no_match = 'No Match'

matches = re.findall(pattern, text, flags=re.M|re.DOTALL)

if matches:
    for match in matches:
        print 'match:', match
    print len(matches)
else:
    print error_no_match 
Ian Zhang
  • 402
  • 3
  • 17

1 Answers1

2

As you read the whole file into a variable text, you may use

matches = re.findall(r'^\[\d{2}/.*(?:\n(?!\[\d{2}/).*)+', text, re.M)

See regex demo. Note that in case your text contains CRLF endings, you need to replace \n with \r?\n (where CR is optional).

Details

  • re.M modifier makes ^ match at the start of a line
  • ^ - start of a line
  • \[ - a [ char
  • \d{2}/ - 2 digits and a / char
  • .* - the rest of the line
  • (?:\n(?!\[\d{2}/).*)+ - one or more repetitions of
    • \n(?!\[\d{2}/) - an LF symbol (use \r?\n if there can be CRLF endings) that is not followed with [ and two digits and /
  • .* - the rest of the line.

Python demo:

import re
rx = r"^\[\d{2}/.*(?:\n(?!\[\d{2}/).*)+"
text = "[11/16/18 16:40:04:098 EST] 000000ae CommandLogger 2   PerfLog <entry operation=\"Command : com.ibm.commerce.messaging.viewcommands.MessagingViewCommandImpl\" parameters=\"@releaseID=9.0 \n[11/16/18 16:40:04:100 EST] 000000ae CommerceSrvr  E MessagingViewCommandImpl nonHttpForwardDocument(String,String) CMN8014E: The URL constructed during composition using ViewName \nAdditional Data: \n    null\nCurrent exception:\nMessage:\n_ERR_BSAFE_FUNCTION\nStack trace:\n[11/16/18 16:40:04:101 EST] 000000ae SystemErr     R   \n[11/16/18 16:40:04:102 EST] 000000ae SystemErr     R   com.ibm.commerce.exception.ECSystemException: The URL constructed during composition using ViewName http://localhost:80/webapp/wcs/stores/IBM.WC.Compose/webservices/OAGIS/9.0/BODs/AcknowledgePaymentInstruction.jsp/******** is invalid {1}.\n    at com.ibm.commerce.messaging.viewcommands.MessagingViewCommandImpl.nonHttpForwardDocument(MessagingViewCommandImpl.java:581)"
matches = re.findall(rx, text, re.M)
print(matches)

Output:

[
  '[11/16/18 16:40:04:100 EST] 000000ae CommerceSrvr  E MessagingViewCommandImpl nonHttpForwardDocument(String,String) CMN8014E: The URL constructed during composition using ViewName \nAdditional Data: \n    null\nCurrent exception:\nMessage:\n_ERR_BSAFE_FUNCTION\nStack trace:', 
  '[11/16/18 16:40:04:102 EST] 000000ae SystemErr     R   com.ibm.commerce.exception.ECSystemException: The URL constructed during composition using ViewName http://localhost:80/webapp/wcs/stores/IBM.WC.Compose/webservices/OAGIS/9.0/BODs/AcknowledgePaymentInstruction.jsp/******** is invalid {1}.\n    at com.ibm.commerce.messaging.viewcommands.MessagingViewCommandImpl.nonHttpForwardDocument(MessagingViewCommandImpl.java:581)'
]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thanks for your help, Wiktor. I had hard time to figure out .*(?:\n(?!\[\d{2}/).*)+' by myself... do you have any idea, where did I got wrong... – Ian Zhang Nov 26 '18 at 20:55
  • @IanZhang It is because you tried to match something in between while here it is easier to get all consecutive lines that do not start with a certain pattern, and `+` quantifier at the end is very important. – Wiktor Stribiżew Nov 26 '18 at 20:56
  • In `.*(?:\n(?![\d{2}/).*)+'`, the `[` is not escaped. You do not match the starting pattern either, although that is of minor importance. – Wiktor Stribiżew Nov 26 '18 at 20:59
  • thanks, also, for _+_, it's aginst .*(?:\n(?![\d{2}/).*) or the the whole patter ^\[\d{2}/.*(?:\n(?!\[\d{2}/).*)? In addition, got confused by ?: can you please explain this a bit? – Ian Zhang Nov 26 '18 at 21:04
  • @IanZhang `(?:...)` is a [non-capturing group](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do), only meant to group a number of patterns into a *sequence* of patterns that you may quantify or use several alternatives. Last `+` in my pattern makes sure the regex only matches multiline entries. – Wiktor Stribiżew Nov 26 '18 at 21:17
  • got it, thanks a lot. I'm new to this, and got lost in terms of which part of the pattern will be processed first... No quite get the operation order. – Ian Zhang Nov 26 '18 at 21:23