Lazy (ungreedy) matching multiple groups using regex

Question

I would like to grab the contents of any value between pairs of <tag></tag> tags.

<tag>
This is one block of text
</tag>

<tag>
This is another one
</tag>

The regex I have come up with is

/<tag>(.*)</tag>/m

Though, it appears to be greedy and is capturing everything within the enclosed parentheses up until the very last </tag>. I would like it to be as lazy as possible so that everytime it sees a closing tag, it will treat that as a match group and start over.

How can I write the regex so that I will be able to get multiple matches in the given scenario?

I have included a sample of what I am describing in the following link

http://rubular.com/r/JW5M3rnqIE

Note: This is not XML, nor is it really based on any existing standard format. I won't need anything sophisticated like a full-fledged library that comes with a nice parser.

One more important thing for you to know is that by using regexen on xml, [you are playing with Ctulthu](http://stackoverflow.com/questions/1732348). Later, don't say you haven't been warned. — Boris Stitnicky, Oct 14 '12 at 19:12
@BorisStitnicky, no need for cargo cult here. Regexes are not recursive, that's all. — nalply, Oct 14 '12 at 19:19
...every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp ... the song of re̸gular expression parsing will extinguish the voices of mortal man from the sphere I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful the final snuffing of the lies of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST — Boris Stitnicky, Oct 14 '12 at 19:22

Ωmega · Accepted Answer · 2012-10-14T18:45:12.767

14

Go with regex pattern:

/<tag>(.*?)<\/tag>/im

Lazy (non-greedy) is .*?, not .*.

To find multiple occurrences, use:

string.scan(/<tag>(.*?)<\/tag>/im)

edited Oct 14 '12 at 18:45

answered Oct 14 '12 at 18:39

Ωmega

42,614
34
134
203

Thanks! I didn't think of trying out things ruby had for regex – MxLDevs Oct 14 '12 at 18:56

Lazy (ungreedy) matching multiple groups using regex

1 Answers1

Linked