10

I would like to grab the contents of any value between pairs of <tag></tag> tags.

<tag>
This is one block of text
</tag>

<tag>
This is another one
</tag>

The regex I have come up with is

/<tag>(.*)</tag>/m

Though, it appears to be greedy and is capturing everything within the enclosed parentheses up until the very last </tag>. I would like it to be as lazy as possible so that everytime it sees a closing tag, it will treat that as a match group and start over.

How can I write the regex so that I will be able to get multiple matches in the given scenario?

I have included a sample of what I am describing in the following link

http://rubular.com/r/JW5M3rnqIE

Note: This is not XML, nor is it really based on any existing standard format. I won't need anything sophisticated like a full-fledged library that comes with a nice parser.

Ωmega
  • 42,614
  • 34
  • 134
  • 203
MxLDevs
  • 19,048
  • 36
  • 123
  • 194
  • 1
    One more important thing for you to know is that by using regexen on xml, [you are playing with Ctulthu](http://stackoverflow.com/questions/1732348). Later, don't say you haven't been warned. – Boris Stitnicky Oct 14 '12 at 19:12
  • @BorisStitnicky, no need for cargo cult here. Regexes are not recursive, that's all. – nalply Oct 14 '12 at 19:19
  • 2
    ...every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp ... the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST – Boris Stitnicky Oct 14 '12 at 19:22

1 Answers1

14

Go with regex pattern:

/<tag>(.*?)<\/tag>/im

Lazy (non-greedy) is .*?, not .*.

To find multiple occurrences, use:

string.scan(/<tag>(.*?)<\/tag>/im) 
Ωmega
  • 42,614
  • 34
  • 134
  • 203