We parse logs created by automated scripts. A typical thing that we'd care about is the string: '1.10.07-SNAPSHOT (1.10.07-20110303.024749-7)'
from the following line:
15:28:02.115 - INFO - TestLib: Successfully retrieved build version: '1.11.11-SNAPSHOT (1.11.11-20110303.024749-7)'
The trouble is, some logs are manually created, with users entering this information themselves. To remind themselves of the format they have added a dialog with the template:
02:24:50.655 - INFO - gui: Step Dialog: For test results management purposes, specify the build in which the test is executed in the following format, build version: 'specify version here'
02:25:04.905 - INFO - gui: Response: OK
02:25:04.905 - INFO - gui: Comments: 'build version: '1.11.11''
My regex for this currently is .*[Bb]uild [Vv]ersion:*\s*(?!.*<)'?([^']*)'
. The '(?!.*<)'
was my first attempt to avoid this problem, because some users would write ''. That doesn't catch the above case though. I think the correct thing to do is going to be a negative lookbehind that does not match if 'Step Dialog'
is present on the line, but my attempts to write that seem to be failing me, according to regexr (for some reason it's not letting me share the link to my saved form). I thought negative lookbehind would look like this: (?<!Step Dialog)
and result in this:
`(?<!Step Dialog).*[Bb]uild [Vv]ersion:*\s*(?!.*<)'?([^']*)'`
but that's matching both the first and third line of the above for some reason.
Edit:
'[Bb]', and ':\s' are for users who entered information in not precisely the right format by using multiple colons and spaces, capitalized 'Build'. Suggestions for cleaning this up in general are appreciated, I'm relatively new to regexs.