2

I have a log string like this :

String s0 = "DC696,\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getRortList.dwr\",\"2222-11-10 08:32:22,351               PLV=REQ CIP=9.9.9.7 CMID=syairp CMN=\"\"Dub Airport Corporation Limited\"\" SN=sfv4_APM180885. DPN=dbPool66HFT01 UID=3862D04108 UN=91F6025D47F01D IUID=1931 LOC=en_GB EID=\"\"EVENT-UNKNOWN-UNKNOWN-ob55abe0118-201110083217-396080\"\" AGN=\"\"[Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35]\"\" RID=REQ-[7274545]  MTD=POST URL=\"\"/xi/ajax/remoting/call/plaincall/adhocRrtBuilderCoollerProxy.getRtList.dwr\"\" RQT=2835 MID=ADIN PID=ADMIN PQ=ADIN_PAGE SUB=0 MEM=2331036 CPU=2410 UCPU=2300 SCPU=110 FRE=10 FWR=0 NRE=2281 NWR=218 SQLC=43 SQLT=142 RPS=200 SID=60826A3FAB005A8A9B930177C5******.pc6bc1029 GID=e262dde6d0e040070b58afd4c8 HSID=ddc665538db779508d3213c0bb63bcb1c49fe8236d5f0884ae975915728e61 CSL=CRITICAL CCON=0 CSUP=0 CLOC=0 CEXT=0 CREM=0 STK={\"\"n\"\":\"\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getrtList.dwr\"\",\"\"i\"\":1,\"\"t\"\":2835,\"\"slft\"\":2679,\"\"sub\"\":[{\"\"n\"\":\"\"SQL:select * from sfv4_HOUA180885.REPORT_DEF WHERE REPORT_DEF_ID IN (SELECT REPORT_DEF_ID FROM sfv4_HA80885.REPORT_DTASET WHERE REPORT_ID=?) AND DELETED=? ORDER BY REPORT_DEF_ID asc NULLS LAST\"\",\"\"i\"\":17,\"\"t\"\":40,\"\"slft\"\":40,\"\"st\"\":337,\"\"m\"\":220958,\"\"nr\"\":154,\"\"rt\"\":0,\"\"rn\"\":22,\"\"fs\"\":0}]}   \",\"2022-11-09T21:32:22.351+0000\",p66cf1029,\"dc606_ss_application\",1,\"/app/tomcat/logs/pef.log\",\"perf_log_yxx\",swsskix13";

I want to extract the KEY=VALUE pairs like {PLV=REQ, CIP=9.9.9.7,CMN="Dub Airport Corporation Limited", STK={...} }. into a Map<String,String>

I attempted with this, which does not work

String[] str1= str.split("\\s(?=(([^\"]*\"))*[^\"]*$)\\s*");
System.out.println("Value of split string is "+ Arrays.toString(str1));

Any inputs will be of great help please.

Alferd Nobel
  • 3,185
  • 2
  • 30
  • 35
  • You could do something like this https://regex101.com/r/AWOUIx/1 however that last STK value is going to need to be handled with special rules due to its embedded delimiters and escape characters – CAustin Nov 23 '22 at 03:19
  • 1
    I have [played with it too (demo)](https://regex101.com/r/3angXs/4). Yea, the last is challenging. My try works only for that part if `{`...`}` is directly after the `KEY=`, there is not another `{`...`}` and it's not deeper nested like in the sample. – bobble bubble Nov 23 '22 at 03:50
  • @bobblebubble- Thank you ! I want to capture the "2222-11-10 08:32:22,351" as "timeStamp" . How do I do that ? I tried something like .*?\s*(\d++-\d++-\d++ \d++:\d++:\d++,\d++).*?\w+ which does not work – Alferd Nobel Nov 23 '22 at 09:55
  • 1
    @AlferdNobel See [this update](https://regex101.com/r/3angXs/5) (the timestamp would be in *group 3*). **Grouping** in my pattern as follows: **1** → *key*, **2** → *value*, **3** → *timestamp*, **4** → *remaining stuff*. But I see you got it solved already, cheers. :) – bobble bubble Nov 23 '22 at 23:27

1 Answers1

1

You can use this solution:

String s0 = "DC696,\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getRortList.dwr\",\"2222-11-10 08:32:22,351               PLV=REQ CIP=9.9.9.7 CMID=syairp CMN=\"\"Dub Airport Corporation Limited\"\" SN=sfv4_APM180885. DPN=dbPool66HFT01 UID=3862D04108 UN=91F6025D47F01D IUID=1931 LOC=en_GB EID=\"\"EVENT-UNKNOWN-UNKNOWN-ob55abe0118-201110083217-396080\"\" AGN=\"\"[Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35]\"\" RID=REQ-[7274545]  MTD=POST URL=\"\"/xi/ajax/remoting/call/plaincall/adhocRrtBuilderCoollerProxy.getRtList.dwr\"\" RQT=2835 MID=ADIN PID=ADMIN PQ=ADIN_PAGE SUB=0 MEM=2331036 CPU=2410 UCPU=2300 SCPU=110 FRE=10 FWR=0 NRE=2281 NWR=218 SQLC=43 SQLT=142 RPS=200 SID=60826A3FAB005A8A9B930177C5******.pc6bc1029 GID=e262dde6d0e040070b58afd4c8 HSID=ddc665538db779508d3213c0bb63bcb1c49fe8236d5f0884ae975915728e61 CSL=CRITICAL CCON=0 CSUP=0 CLOC=0 CEXT=0 CREM=0 STK={\"\"n\"\":\"\"/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getrtList.dwr\"\",\"\"i\"\":1,\"\"t\"\":2835,\"\"slft\"\":2679,\"\"sub\"\":[{\"\"n\"\":\"\"SQL:select * from sfv4_HOUA180885.REPORT_DEF WHERE REPORT_DEF_ID IN (SELECT REPORT_DEF_ID FROM sfv4_HA80885.REPORT_DTASET WHERE REPORT_ID=?) AND DELETED=? ORDER BY REPORT_DEF_ID asc NULLS LAST\"\",\"\"i\"\":17,\"\"t\"\":40,\"\"slft\"\":40,\"\"st\"\":337,\"\"m\"\":220958,\"\"nr\"\":154,\"\"rt\"\":0,\"\"rn\"\":22,\"\"fs\"\":0}]}   \",\"2022-11-09T21:32:22.351+0000\",p66cf1029,\"dc606_ss_application\",1,\"/app/tomcat/logs/pef.log\",\"perf_log_yxx\",swsskix13";
        String regex = "(\\w+)=((?=\\{)(?:(?=.*?\\{(?!.*?\\3)(.*\\}(?!.*\\4).*))(?=.*?\\}(?!.*?\\4)(.*)).)+?.*?(?=\\3)[^{]*(?=\\4$)|\"{2}(.*?)\"{2}|(\\S+))";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(s0);
Map<String, String> res = new HashMap<String, String>();
while(m.find()) {
    String val = m.group(2);
    if (m.group(5) != null) {
        val = m.group(5);
    }
    if (m.group(6) != null) {
        val = m.group(6);
    }
    res.put(m.group(1), val);
    System.out.println(m.group(1) + " => " + val + "\n----");
}

Output:

PLV => REQ
----
CIP => 9.9.9.7
----
CMID => syairp
----
CMN => Dub Airport Corporation Limited
----
SN => sfv4_APM180885.
----
DPN => dbPool66HFT01
----
UID => 3862D04108
----
UN => 91F6025D47F01D
----
IUID => 1931
----
LOC => en_GB
----
EID => EVENT-UNKNOWN-UNKNOWN-ob55abe0118-201110083217-396080
----
AGN => [Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.35]
----
RID => REQ-[7274545]
----
MTD => POST
----
URL => /xi/ajax/remoting/call/plaincall/adhocRrtBuilderCoollerProxy.getRtList.dwr
----
RQT => 2835
----
MID => ADIN
----
PID => ADMIN
----
PQ => ADIN_PAGE
----
SUB => 0
----
MEM => 2331036
----
CPU => 2410
----
UCPU => 2300
----
SCPU => 110
----
FRE => 10
----
FWR => 0
----
NRE => 2281
----
NWR => 218
----
SQLC => 43
----
SQLT => 142
----
RPS => 200
----
SID => 60826A3FAB005A8A9B930177C5******.pc6bc1029
----
GID => e262dde6d0e040070b58afd4c8
----
HSID => ddc665538db779508d3213c0bb63bcb1c49fe8236d5f0884ae975915728e61
----
CSL => CRITICAL
----
CCON => 0
----
CSUP => 0
----
CLOC => 0
----
CEXT => 0
----
CREM => 0
----
STK => {""n"":""/xi/ajax/remoting/call/plaincall/adhocReportBuilderControllerProxy.getrtList.dwr"",""i"":1,""t"":2835,""slft"":2679,""sub"":[{""n"":""SQL:select * from sfv4_HOUA180885.REPORT_DEF WHERE REPORT_DEF_ID IN (SELECT REPORT_DEF_ID FROM sfv4_HA80885.REPORT_DTASET WHERE REPORT_ID=?) AND DELETED=? ORDER BY REPORT_DEF_ID asc NULLS LAST"",""i"":17,""t"":40,""slft"":40,""st"":337,""m"":220958,""nr"":154,""rt"":0,""rn"":22,""fs"":0}]}
----

See the regex demo.

Regex details:

  • (\w+) - Group 1: one or more word chars
  • = - a = char
  • ((?=\{)(?:(?=.*?\{(?!.*?\3)(.*\}(?!.*\4).*))(?=.*?\}(?!.*?\4)(.*)).)+?.*?(?=\3)[^{]*(?=\4$)|\"{2}(.*?)\"{2}|(\S+)) - Group 2:
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This is great ! for STK , I want to include the {} as part of the capture. Also I want to capture the "2222-11-10 08:32:22,351" as "timeStamp" . How do I do that ? I tried something like ```.*?\s*(\d++-\d++-\d++ \d++:\d++:\d++,\d++).*?\w+``` which does not work. – Alferd Nobel Nov 23 '22 at 09:51
  • @AlferdNobel Replace `.replaceAll("^(?:\"\"|\\{)|(?:\\}|\"\")$", "")` with `.replaceAll("^\"\"|\"\"$", "")` to keep the braces. Actually, you may use [this regex](then) and the code should be a little adjusted. The timestamp extraction is out of scope in the current question, but `\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d+` [works](https://regex101.com/r/jImA4l/1). – Wiktor Stribiżew Nov 23 '22 at 09:56
  • 1
    @AlferdNobel See https://ideone.com/8Nw2AE – Wiktor Stribiżew Nov 23 '22 at 10:11
  • Thanks for your contribution. That was very helpfull ! – Alferd Nobel Nov 23 '22 at 18:46