0

I'm trying to find the regex pattern to match strings in LSF log files (lsb.acct).
Lines (log entries) in lsb.acct contain numbers and strings. Numbers are easy to match but I have a challenge with strings.
Strings are marked with surrounding "", characters " inside the string are escaped with another ".
Example:
line from lsb.acct: 0 90 "dsfc --copt ""-c"" nfc_pa" 0.01 "" -1 contains the following fields:

0 (number)  
90 (number)  
dsfc --copt "-c" nfc_pa (string)  
0.01  
(empty string)  
-1 (number)  

I tried /"([^"]*)"/ but it obviously doesn't solve the problem - doesn't catch the escaped double-quotes inside the string but cuts the string short.
I was thinking to add a look-ahead operator "(?=") to respect the escaped double-quote but I don't know where/how - doesn't work inside [].
Can anybody hint a proper regexp to match the string with respecting " as escape for " inside the string?

  • Your example string is `0 90 "dsfc --copt ""-c"" nfc_pa" 0.01 "" -1`, what is the expected outcome? – Christian Baumann Sep 24 '20 at 13:43
  • @ChristianBaumann, my expected outcome is a Python list of matched strings from re.findall(pattern,line) - with escape characters removed. For my example: re.findall(pattern,'0 90 "dsfc --copt ""-c"" nfc_pa" 0.01 "" -1') should return ['dsfc --copt "-c" nfc_pa', '']. – Tomasz Jozefiak Sep 24 '20 at 17:07
  • @WiktorStribiżew, thanks for your feedback, I'm new to the site, I didn't realize the etiquette. I have spent considerable time looking for solution, please see my edited question. – Tomasz Jozefiak Sep 24 '20 at 17:15
  • Use `"((?:[^"]+|"")*)"|\S+`, see [demo](https://regex101.com/r/zBwPW6/1). – Wiktor Stribiżew Sep 24 '20 at 19:06
  • This works! Doesn't cut out the escape " inside the string but this is sth I can live with :-) Thanks @WiktorStribiżew, greatly appreciated! – Tomasz Jozefiak Sep 25 '20 at 15:48

0 Answers0