2

I am trying to capture a line in a logfile using the onigurama regex library (in Logstash) using a negative look-behind but it still seems to match the line that it shouldn't. I am trying to match only the top level exception and not the one starting with Caused By:

Somebody helped me write this

Tested on Rubular http://rubular.com/r/N3AzySNHiS

Tested Regex

^(?<!Caused by: ).*?Exception

(?<!^Caused by: ).*?Exception

Message:

2016-11-15 05:19:28,801 ERROR [App-Initialisation-Thread] appengine.java:520 Failed to initialize external authenticator myapp Support Access || appuser@vm23-13:/mnt/data/install/assembly app-1.4.12@cad85b224cce11eb5defa126030f21fa867b0dad
java.lang.IllegalArgumentException: Could not check if provided root is a directory
    at com.myapp.jsp.KewServeInitContextListener$1.run(QServerInitContextListener.java:104)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.file.NoSuchFileException: fh-ldap-config/
    at com.upplication.s3fs.util.S3Utils.getS3ObjectSummary(S3Utils.java:55)
    at com.upplication.s3fs.util.S3Utils.getS3FileAttributes(S3Utils.java:64)

Logstash result

"exception" => "Caused by: java.nio.file.NoSuchFileException"
Arturski
  • 1,142
  • 3
  • 14
  • 26
  • Try `^(?!Caused by: ).*?Exception`, or `^(?!Caused by:)(?.*?Exception)` – Wiktor Stribiżew Nov 28 '16 at 11:06
  • Thank you for the reply Wiktor, first returned `"exception" => " at java.lang.Thread.run(Thread.java:745)\nCaused by: java.nio.file.NoSuchFileException"` , the second one returned 2 results ` "exception" => [ [0] " at java.lang.Thread.run(Thread.java:745)\nCaused by: java.nio.file.NoSuchFileException", [1] " at java.lang.Thread.run(Thread.java:745)\nCaused by: java.nio.file.NoSuchFileException"` – Arturski Nov 28 '16 at 11:19
  • I suspect there is some setting that makes `.` symbol in the regex match the linebreak symbols. Or some other option like Ignore whitespace is ON. Please check if MULTILINE mode is turned on anywhere. Also, a good idea is to check the `^(?!Caused\ by:)(?[^\r\n]*?Exception)` regex – Wiktor Stribiżew Nov 28 '16 at 11:26
  • Thank you @WiktorStribiżew! the last regex worked like a charm but returned 2 results `"exception" => [ [0] "com.fredhopper.frontend.view.ViewCreationException", [1] "com.fredhopper.frontend.view.ViewCreationException" ` does it match the same line twice somehow? – Arturski Nov 28 '16 at 12:45
  • No, again, that is something we talked about last time, I have no idea what setting might return the captured text twice. – Wiktor Stribiżew Nov 28 '16 at 12:47

2 Answers2

1

It seems there are some additional options set in your Logstach environment. From my tests, I suspect the "verbose" or "ignore whitespace" option is enabled. Also, to exclude any other issues with . (that may be redefined to match line break symbols), you may use an unambiguous [^\r\n] (any char not \r and \n):

^(?!Caused\ by:)(?<exception>[^\r\n]*?Exception)
          ^^                 ^^^^^^^

The escaped space will always match a single regular space.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thanks again Wiktor! working great and appreciate the explanation – Arturski Nov 28 '16 at 13:01
  • Yes done thanks Wiktor, by the way, we managed to get rid of the double result by removing ?, so the final regex was `^(?!Caused\ by:)([^\r\n]*?Exception)` – Arturski Nov 29 '16 at 09:15
  • That means the [`named_captures_only`](https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-named_captures_only) was set to *false*. If it were default (*true*), the named capture group would be output only. – Wiktor Stribiżew Nov 29 '16 at 09:19
0

Note: I am assuming throughout this answer that the 2 individual log lines shown in the problem and repeated below do not contain newlines and have been processed through the multiline codec plugin in logstash or removed in some way.

TL;DR The Solution Using a Negative Lookbehind

A negative look behind will work if it is given an appropriate anchor afterwards. Looking at the two lines this would work well:

^(?<!Caused by: )java.*Exception

Note: it could just be ^(?<!Caused by: )j.*Exception but I think the java makes it more readable.

Explanation of Problem with Sample Code

The problem with the given regular expressions: ^(?<!Caused by: ).*?Exception and (?<!^Caused by: ).*?Exception is the reluctant *? quantifier that allows something to be matched 0 or more times. Now as explained in this answer the regex engine starts at the beginning of the string and moves left to write. The smallest possible number of characters (since it is reluctant) is nothing but the engine cannot match Exception and then it incrementally tries to match anything (.) before Exception ("backtracking") moving left to write.

So the regex engine keeps trying to match one more character at a time (from left to right) until Exception is found after what is has consumed. Therefore the string

Caused by: java.nio.file.NoSuchFileException: fh-ldap-config/ at com.upplication.s3fs.util.S3Utils.getS3ObjectSummary(S3Utils.java:55) at com.upplication.s3fs.util.S3Utils.getS3FileAttributes(S3Utils.java:64)

Does match because the engine has consumed everything up to Exception and Caused by: doesn't appear before this match. Essentially the .*? has consumed the Caused by: that the negative lookbehind is looking for.

Understanding Deeper

To understand what the regex engine is actually doing with lookarounds I recommend viewing this answer

I think it's easy to get caught up by quantifiers and lookarounds and as a general rule I think lookarounds need to be anchored by something concrete (not .). To understand what I mean let's look at slight variation on the given regex with the greedy * quantifier . The regex ^(?<!Caused by: ).*Exception also matches the quoted string.

The reason why is that the greedy * qualifier starts by consuming the entire string and then backtracks from right to left as explained in the first linked answer above. For the same reason (but from the other side) once the engine matches Exception it holds everything from the start of the string up to Exception. It then looks behind what it has consumed and does not find Caused by: and successfully matches the string.

In Summary, as a General Rule

Always anchor lookarounds when using greedy or reluctant quantifiers.

ddrake12
  • 831
  • 11
  • 22