-1

I'm trying to replace all spaces in a log file with commas (to convert it to CSV format). However, some log entries have spaces that I don't want replaced. These entries are bounded by quotation marks. I looked at a couple of examples and came up with the following code, which seems to work in RegExr.com and regex101.com.

[\s](?=(?:"[^"]*"|[^"])*$)

However, when I do a find/replace with that expression, it runs correctly until it hits the first quotation with a space and then selects the entire contents of the file.

Sample log file entry:

date=2020-08-24 time=07:35:15 idseq=216296511061885345 itime="2020-08-24 07:35:15" euid=3 epid=4107 dsteuid=3 dstepid=101 type="utm" subtype="webfilter" level="notice" action="passthrough" msg="URL belongs to an allowed category in policy"

Desired result:

date=2020-08-24,time=07:35:15,idseq=216296511061885345,itime="2020-08-24 07:35:15",euid=3,epid=4107,dsteuid=3,dstepid=101,type="utm",subtype="webfilter",level="notice",action="passthrough",msg="URL belongs to an allowed category in policy"

RegExr result: RegExr

EDIT: After more testing, it appears that with a single line, the replace works. However, if you have more than one line, it replaces all lines with the replace character (in my case, the comma).

McKenning
  • 631
  • 4
  • 20
  • 32
  • 1
    It seems to be working fine (in Notepad++). However, your "desired result" seems to be missing some commas (for spaces not enclosed in quotation marks). Why is that? – 41686d6564 stands w. Palestine Aug 24 '20 at 14:36
  • I'm tired and didn't see those two in the preview. That is fixed now. Perhaps there is something in the full file (the single log line was redacted for simplicity and clarity) that is causing this behavior? – McKenning Aug 24 '20 at 14:40
  • 1
    The `[^"]` matches all characters except the double-quote, so it matches CRs and LFs. I think you need to use `[^"\r\n]` in both places so it only matches within a line. – AdrianHHH Aug 24 '20 at 14:54
  • @AdrianHHH: That didn't seem to fix it. I tried with \r\n as well as just \n. I tried in both instances of `[^"]` as well as just one or the other. – McKenning Aug 24 '20 at 14:58

2 Answers2

1

While lengthy, if you have a known list of values, you can simply use them as replacement keys

  • first value is skipped as it shouldn't be prefixed with ,
  • must capture and = around labels to be more sure, (though this does not guarantee it will not find substrings in the msg field)
's/ (time|idseq|itime|euid|epid|dsteuid|dstepid|type|subtype|level|action|msg)=/,$1='

Example in Python

import re
>>> source = '''date=2020-08-24 time=07:35:15 idseq=216296511061885345 itime="2020-08-24 07:35:15" euid=3 epid=4107 dsteuid=3 dstepid=101 type="utm" subtype="webfilter" level="notice" action="passthrough" msg="URL belongs to an allowed category in policy"'''
>>> regex = ''' (time|idseq|itime|euid|epid|dsteuid|dstepid|type|subtype|level|action|msg)='''
>>> print(re.sub(regex, r",\1=", source))  # raw string to prevent loss of 1
date=2020-08-24,time=07:35:15,idseq=216296511061885345,itime="2020-08-24 07:35:15",euid=3,epid=4107,dsteuid=3,dstepid=101,type="utm",subtype="webfilter",level="notice",action="passthrough",msg="URL belongs to an allowed category in policy"

You may find some values contain \" or similar, which can break even quite careful regular expressions!

Also note for a CSV you may wish to replace the field names entirely

ti7
  • 16,375
  • 6
  • 40
  • 68
  • 1
    An interesting option. And you are correct, I do want to strip the header= text, but I didn't want to muddy the question with this in here. Ultimately, I went with Toto's answer below because it was more straightforward. – McKenning Aug 24 '20 at 16:38
1
  • Ctrl+H
  • Find what: "[^"\r\n]+"(*SKIP)(*FAIL)|\h+
  • Replace with: ,
  • CHECK Wrap around
  • CHECK Regular expression
  • Replace all

Explanation:

"[^"\r\n]+"     # everything between quotes
(*SKIP)(*FAIL)  # kip and fail  the match
|               # OR
\h+             # 1 or more horizontal spaces

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

Toto
  • 89,455
  • 62
  • 89
  • 125