Capture repeating elements between two words in JSON using RegEx

Question

I have a web service response, returning JSON, that I need to parse. I would like to capture all instances of "id":"123-abc-345" between a starting word and an ending word. I need to know all IDs so that I can randomly choose between them. The reason I have to look between the "key beginning word" and "key ending word" is that the ID element appears in the document in various places (even before the beginning and ending words), but I'm just interested in the IDs between "beginning" and "ending".

Example data I'm using:

[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---AAAAA{[]}bla...esghshrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---ZZZZZ[]}6778:---esghsrth"id":"95907bc09-568976456-c6a5a-4f87g"[]}6778:---

I have managed to get as far as: (.*?)(\"id\":\"[^"]*)+ which DOES capture the IDs I'm interested in - unfortunately also the ones I don't need (before AAAAA and after ZZZZZ).

This and this and this comes close - but still no cigar. Any help would be greatly appreciated - either a pointer in the right direction or a complete working regex (even though a working example would be preferred :-) )

Thanks regex gurus !

score 1 · Answer 1 · answered Nov 26 '16 at 18:23

1

You can first capture everything between these two keys words with (?<=AAAAA).*?(?=ZZZZZ), then search in the result for this regex (?<=\"id\":\").*?(?=\"). The latter will match everything between "id":" and " excluding those.

answered Nov 26 '16 at 18:23

Nicolas

6,611
3
29
73

Great, thanks for the solution Nicolas ! I guess I'll have to dust off my C string manipulation and regex skills (this is for Loadrunner...) but this will do nicely. Thanks again. Last question - I can't combine the two steps in one pass, right ? – Hiro Protagonist Nov 26 '16 at 18:42
1

I don't think so. – Nicolas Nov 26 '16 at 19:07
Nicolas, Evgeniy: I accepted Evgeniy's answer as the "right" one because it saves me a *lot* of headaches in the processing I need to do. They both work, and I appreciate your help very much ! Thank you ! – Hiro Protagonist Nov 26 '16 at 21:26

score 1 · Accepted Answer · edited May 23 '17 at 12:24

1

All in one step, but a little tricky (demo):

AAAAA(?!\"id\":\"[^"]*\").*?(\"id\":\"[^"]*\")|(?<!^)\G(?!\"id\":\"[^"]*\").*?(\"id\":\"[^"]*\")(?=.*ZZZZZ)

Simplified version, where \"id\":\"[^"]*\" replaced by id:

AAAAA(?!id).*?(id)|(?<!^)\G(?!id).*?(id)(?=.*ZZZZZ)

Inspired by @nhahtdh explanation.

edited May 23 '17 at 12:24

Community

1
1

answered Nov 26 '16 at 19:10

Evgeniy Maynagashev

690
1
8
13

Evgeniy - wow. I think I need some time to understand how you're doing it ;-) It would actually be very much preferable to do it in one pass - that'll save me the trouble with C string maniuplation/regex (bleh) - and all the header/library inclusion stuff. Spasiba !!! – Hiro Protagonist Nov 26 '16 at 21:22

Capture repeating elements between two words in JSON using RegEx

2 Answers2