1

I designed the regex to match the all multiline exception or warning message field for fluentd parser in rubular format as below

(SLF4J:\s.*|[a-zA-z_]*\..*\.*\s.*\s.*|Caused\sby:\s|\s+at\s.*|\s+\.\.\. (\d)+ more)

It matches unnecessary fields.

I want to match all start of exception or warning multiline. In short: The most recent multiline will be read from the beginning of the file unitl it gets a next line as JSON.JSON always starts with {" togather. when we see lines begings with {" we will stop reading multiline

one regex for both the cases or 2 regex for both the cases is fine

Demo link

regex is available at: https://rubular.com/r/O26Wm6mc7z51re

regex is available at: https://rubular.com/r/v6Q7iwZqmNDAAx

Test Strings is :

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more
{"log_timestamp": "2021-02-18T11:33:23.114+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled)", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "PeerState set to LOOKING"}
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.intam.svc.cluster.local"}
java.net.UnknownHostException: zk-2.zk-headless.intam.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)
        at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:764)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:699)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
        at java.lang.Thread.run(Thread.java:748)
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.sxc.svc.cluster.local"}

Expected Match : For demo1: https://rubular.com/r/O26Wm6mc7z51re

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more

For demo2 :https://rubular.com/r/v6Q7iwZqmNDAAx

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type 
SKumar
  • 13
  • 4

1 Answers1

0

You might get both parts using a single pattern with a capture group and a backreference

^(SLF4J:|java\.lang\.InterruptedException:).*(?:\R(?!\1|{).*)*

The pattern matches:

  • ^ Start of string
  • (SLF4J:|java\.lang\.InterruptedException).* Capture in group 1 matching either of the alternatives
  • (?: Non capture group
    • \R(?!\1|{).* Match a newline and assert that the string does not start with either wat is captured in group 1 or {
  • )* Close the group and optionally repeat to match all lines

Regex demo

See the rubular match for the first part and the second part.

Note that in Java to double the backslashes

String regex = "^(SLF4J:|java\\.lang\\.InterruptedException:).*(?:\\R(?!\\1|\\{).*)*";

To not cross SLF4J or different types of Exceptions denoted as dot separated strings at the start of the string:

^(?:SLF4J:|\w+(?:\.\w+)+).*(?:\R(?!(?:SLF4J:|\w+(?:\.\w+)+)|{).*)*

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • exception can start with any names .in our example java.lang.InterruptedException ,it can be xxx.yyy.zzz or can be org.apache.xxx.Can we make hardcoded value to generic – SKumar Feb 18 '21 at 15:27
  • org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:699) dorg.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManorg.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:699) ddddddddd it is not stopping at 699) ,it is also takind next line ddd.,For any exception can we stop 699) – SKumar Feb 18 '21 at 15:41
  • I want only all first occurance of multilines. As per your given regex it is available at: https://rubular.com/r/xNbOmFdTFV8Hlx . – SKumar Feb 18 '21 at 16:36
  • @SKumar Like this? https://rubular.com/r/d6uRi8GsDFPwk1 – The fourth bird Feb 18 '21 at 16:37
  • For example as above I want only this much not all – SKumar Feb 18 '21 at 16:37
  • java.lang.InterruptedException: Timeout while waiting for epoch from quorum at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284) ... 19 more – SKumar Feb 18 '21 at 16:37
  • Anything below will be rejected.Always it looks begining of all multiline.But your solution is perfect.if it possible in regex then I can avoid logic in code – SKumar Feb 18 '21 at 16:39
  • FluentD collects logs and send to kibana.During sending we apply your regex to eliminate multiline .Fluentd is accept Rubby style re.gex.If regex directly work then we donot need our own custom plugin.else we might write in java – SKumar Feb 18 '21 at 16:46
  • @SKumar Another option could be for example https://rubular.com/r/SSbpFsXXknZbCL with the `m` flag and a capture group. The other patterns will work in Java as well, you just need to get the first match only. – The fourth bird Feb 18 '21 at 17:00
  • In general exception stops at (3dots numbr more) like ... 11 more.I wrote for this as \.\.\. (\d)+ more.My case 3dots numbr more is optional.but from my data i understood I will stop multiline reading when i will get [3dots numbr more] or next line is a json. – SKumar Feb 18 '21 at 17:04
  • In short: The most recent multiline will be read from the beginning of the file unitl it gets a next line as JSON.JSON always starts with {" togather. when we see lines begings with {" we will stop reading multiline – SKumar Feb 18 '21 at 17:34
  • Unfortunatly it is not working in rubby code – SKumar Feb 19 '21 at 06:56
  • Your regex is available at: https://rubular.com/r/4hrpKb29nQgyH5 – SKumar Feb 19 '21 at 06:56
  • below regex is working in ruby plugin of fluent d but takes unnecessary record ::Your regex is available at: https://rubular.com/r/IFEwL8Sk998muT – SKumar Feb 19 '21 at 06:57
  • we can check regex here :https://fluentular.herokuapp.com/parser – SKumar Feb 19 '21 at 06:59