0

I have this log file that I'm currently trying to parse.

Jan 12 2019, 14:51:23, 117, 10.0.0.1, neil.armstrong, standard-users, -, TCP_Connect, "sports betting", -, 201, accept, GET, text, https, www.best-site.com, 443, /pages/home.php, ?user=narmstrong&team=wizards, -, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome Safari/537.36", 192.168.1.1, 1400, 1463, -, -, -
Jan 12 2019, 14:52:14, 86, 10.0.0.1, neil.armstrong, standard-users, -, TCP_Connect, "sports betting", -, 200, accept, POST, text, https, www.upload.best-site.com, 443, /, -, -, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537", 192.168.1.1, 230056, 600, -, -, -
Jan 12 2019, 14:52:54, 118, 10.0.0.1, neil.armstrong, standard-users, -, TCP_Connect, "sports betting", -, 200, accept, GET, text/javascript, http, google.fr, 80, /search, ?q=wizards, -, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537", 192.168.1.1, 1717, 17930, -, -, -

this is the regex that I'm currently using https://regex101.com/r/Asbpkx/3 it parses the log file fine until it reaches "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537" then it splits at (KHTML, =like Gecko) How can I complete the regex so that this does not happen?

tayvionp
  • 13
  • 2

3 Answers3

0

It looks like you are trying to parse csv using regex.

Use the regex described in this post: https://stackoverflow.com/a/18147076/9397882

Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)

Russ Brown
  • 171
  • 6
  • I've looked into that thread it didn't answer the question. I'm parsing through a log file, but when i get to this part of the log` "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537"` is parses this into a separate group `"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML` and this into another group `=like Gecko)` – tayvionp Mar 21 '19 at 21:55
  • Regex is not the right tool for parsing CSV. What is your desired output from this? – Russ Brown Mar 21 '19 at 22:24
0

Don't use regex for a CSV. Try these props.conf settings.

[mysourcetype]
INDEXED_EXTRACTIONS = CSV
FIELD_DELIMITED = ,
FIELD_QUOTE = "
FIELD_NAMES = Date, Time, Field3, IP_Addr, Field4, Field5, Field6
TIMESTAMP_FIELDS = Date, Time
RichG
  • 9,063
  • 2
  • 18
  • 29
0

I looked into this closer and the log file is not CSV format which is why the CSV parsing regex didn't work in my previous answer. (I also tried parsing it with excel and python csv, and both split at the comma after 'KHTML'.

Using a negative lookbehind makes the example you gave parse correctly.

(.+?)(?<!KHTML), 
Russ Brown
  • 171
  • 6