4

I want to extract some from a string that KEYs are separated from VALUEs by colon(:) and s separated by comma(,). The problem is that VALUEs can contain comma. As an example:

category:information technology, computer,publisher:Elsevier (EV),subject:Ecology, Evolution, Behavior and Systematics

In this example the KEYs that must extract are: category, publisher and subject. The final result must be as follow:

category = information technology, computer
publisher = Elsevier (EV)
subject = Ecology, Evolution, Behavior and Systematics

I tried to write a recursive regex but it doesn't work :

(category|publisher|subject):(.*?)(?:,(?R)|.?)

Can someone help to solve this problem. Thanks.

M Sh
  • 429
  • 2
  • 4
  • 11

1 Answers1

5

Well, if you can add a comma to the end of the string, I think this works:

(\w+):([^:]+),

Edit:

Jonathan Kuhn is totally right:

(\w+):([^:]+)(?:,|$)

This works

user372495
  • 700
  • 4
  • 11
  • 3
    If you replaced the trailing comma with `(?:,|$)` it would match a comma or the end of the string in a non-grouping match. Eliminating the need to add a comma to the end of the string. https://regex101.com/r/gG9kF3/1 – Jonathan Kuhn Feb 19 '16 at 00:32
  • Thanks. this regex works correctly but can you analyse below regex that also works : (\b\w+):(.*?(?=,\w+:|$)) – M Sh Feb 19 '16 at 07:53