2

I have a log file that contrains entries like following:

[08/30/19 16:00:01:001 EDT] [SRNotes_Worker-1] INFO com.emc.clm.srnotes.schedule.SRNotesItemProcessor Started processing the SrTriageFile instance with ID 38 and file ID 250339290
[08/30/19 16:00:01:001 EDT] [SRNotes_Worker-1] TRACE org.springframework.jdbc.core.StatementCreatorUtils Setting SQL statement parameter value: column index 2, parameter value [73651266], value class [java.lang.String], SQL type unknown

What would be the regex pattern that I can use for extracting all the relevant fields as following

Timestamp: [08/30/19 16:00:01:001 EDT]
Thread: [SRNotes_Worker-1]
Level: INFO
Class: com.emc.clm.srnotes.schedule.SRNotesItemProcessor
Message: Started processing the SrTriageFile instance with ID 38 and file ID 250339290

I have written a function that goes through each character of the string and checks for '[', spaces and other rules like such. Based on that I split the log entries. But I know that it's not an elegant solution. I should use regex but I do not have enough idea about it.

  • What you are looking for is capturing group – jhamon Oct 30 '19 at 13:13
  • [This](https://www.tutorialspoint.com/javaregex/javaregex_capturing_groups.htm) guide on Tutorialspoint shows you how to set up a capturing group and [this](https://stackoverflow.com/questions/28038364/java-regular-expression-matcher-doesnt-find-all-possible-matches) thread on SE talks about that tutorial and about changing it to do something else. – yur Oct 30 '19 at 13:26
  • This should help you get started `^(?\[.+?\])\s*(?\[.+?\])` – MonkeyZeus Oct 30 '19 at 13:32
  • If you can configure log4j, might be worth investigating if you cannot just change the log format to JSON which is more easily processed. – Thilo Oct 30 '19 at 13:55
  • https://regexr.com/4ns9o – slf Oct 30 '19 at 14:41
  • Not a duplicate of a **_Reference_** !! –  Oct 31 '19 at 00:44

2 Answers2

0

You should try to get into how regex works so can build your own expressions, but here is an example:

(\[[^\]]+\])\s(\[[^\]]+\])\s([^\s]+)\s([^\n]+)

//Finds a sequence starting with an [ and ending with the next ]
//(between anything is allowerd; n > 0 times any symbol that is not a ])
//The brackets ( ) are symboling a group, which shuold be selected
(\[[^\]]+\])

//Symbols one space
            \s

//Same as the first
              (\[[^\]]+\])

//space again
                          \s

//selects anyhting untill the next sapce
                            ([^\s]+)

//space one last time
                                    /s

//selects anyhting to the linebreak
                                      ([^\n]+) 

You probably want to tweak this further for your case, but this should be enaugh to get you started.

Tim Schmidt
  • 1,297
  • 1
  • 15
  • 30
0

If your lines always look as indicated, you can also use the String.split method here. Split your lines at every space but not if the space occurs inside square brackets. Limit the split length to 5:

String line = "[08/30/19 16:00:01:001 EDT] [SRNotes_Worker-1] INFO com.emc.clm.srnotes.schedule.SRNotesItemProcessor Started processing the SrTriageFile instance with ID 38 and file ID 250339290";

//line.split("\\s")  split a string at each space
//line.split("\\s",5) split a string at each space limit length of result to 5
//line.split("\\s(?![^\\[]*[\\]])") split a string at each space but not inside []

All the above combined you could do just something like:

Arrays.stream(line.split("\\s(?![^\\[]*[\\]])",5)).forEach(System.out::println);

or save them into an array

String[] myData = line.split("\\s(?![^\\[]*[\\]])",5);

and access each element, myData[0] is your Timestamp myData[1] is your Thread ...

Eritrean
  • 15,851
  • 3
  • 22
  • 28