0

Im trying to parse

|123|create|item|1497359166334|Sport|Some League|\|Team\| vs \|Team\||1497359216693|

With regex (https://regex101.com/r/KLzIOa/1/)

I currently have

[^|]++

Which is parsing everything correctly except \|Team\| vs \|Team\|

I would expect this to be parsed as |Team| vs |Team|

If i change the regex to

[^\\|]++

It parses the Teams separately instead of together with the escaped pipe

Basically i want to parse the fields between the pipes however, if there are any escaped pipes i would like to capture them. So with my example i would expect

["123", "create", "item", "1497359166334", "Sport", "Some League", "|Team| vs |Team|", "1497359216693"]
Jack Wilkinson
  • 477
  • 3
  • 14

3 Answers3

3

You can alternate between:

  • \\. - A literal backslash followed by anything, or
  • [^|\\]+ - Anything but a pipe or backslash
(?:\\.|[^|\\]+)+

https://regex101.com/r/KLzIOa/2

Note that there's no need for the possessive quantifier, because no backtracking will occur.

If you also want to replace \|s with |s, then do that afterwards: match \\\| and replace with |.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
1

To handle escaping, you should match a backslash and the character after it as a single "item".

(?:\\.|[^|])++

This conveniently also works for escaping the backslashes themselves!

To then remove the backslashes from the results, use a simple replacement:

Replace: \\(.)
With: $1
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
1

Use:

(?:\\\||[^|])+

Demo & explanation

Toto
  • 89,455
  • 62
  • 89
  • 125