1

With a key value pair string that is separated by space character (just one I believe will ever happen) but also allows spaces and other white space (e.g. newlines, tabs) in the value, e.g.

a=1 b=cat c=1 and 2 d=3

becomes:

  • a=1
  • b=cat
  • c=1 and 2
  • d=3

i.e. I want to extract all the pairs as groups.

I cannot figure out the regex. My sample doesn't include newline but that could also happen

I've tried the basics like:

(.+?=.+?)

\s?([^\s]+)

but these fail with space and newlines. I'm coding it also so can tidy up any leading/trailing characters where needed, I just rather do it in regex than scan one character at a time.

Neil
  • 357
  • 2
  • 10
  • Do you want to capture it as a group? So that you capture `a-1`, `b-2`, `c=1` and `d=3`? Because in that case you could use something like this `([a-z]=[0-9])` – Vivendi Sep 27 '22 at 09:47
  • yes, just need to separate the items, but this won't work as it needs to separate on space but also allow space in the value part – Neil Sep 27 '22 at 10:18

1 Answers1

1

You can use

([^\s=]+)=([\w\W]*?)(?=\s+[^\s=]+=|$)

See the regex demo. Details:

  • ([^\s=]+) - Group 1: one or more chars other than whitespace and = char
  • = - a = char
  • ([\w\W]*?) - Group 2: any zero or more chars, as few as possible
  • (?=\s+[^\s=]+=|$) - a positive lookahead that requires one or more whitespaces followed with one or more chars other than whitespace and = followed with = or end of string immediately to the right of the current location.

A better idea to match any character instead of [\w\W] is by using a . and the singleline/dotall modifier (if supported, see How do I match any character across multiple lines in a regular expression?), here is an example:

(?s)([^\s=]+)=(.*?)(?=\s+[^\s=]+=|$)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks, you've gone the extra mile, I've never used lookahead before either :) However, it doesn't quite work with newlines, e.g. put a carriage return in the 'b=1 and 2' after the 'and' – Neil Sep 27 '22 at 10:05
  • @Neil Not sure about 1), do you mean you do not need the capturing groups? If yes, remove the capturing parentheses. 2) It will depend on the regex flavor, so e.g. https://regex101.com/r/5L2HnL/2 – Wiktor Stribiżew Sep 27 '22 at 10:08
  • Sorry, I updated my comment. If you put a carriage return after 'and' it doesn't work. Is there a way to include all whitespace? – Neil Sep 27 '22 at 10:11
  • @Neil `(?s)([^\s=]+)=(.*?)(?=\s+[^\s=]+=|\z)` is [working](https://regex101.com/r/5L2HnL/3). – Wiktor Stribiżew Sep 27 '22 at 10:14
  • superb. do you want to update your answer? – Neil Sep 27 '22 at 10:19