1

I'm having an issue trying to capture a group on a string:

"type=gist\nYou need to gist this though\nbecause its awesome\nright now\n</code></p>\n\n<script src=\"https://gist.github.com/3931634.js\"> </script>\n\n\n<p><code>Not code</code></p>\n"

My regex currently looks like this:

/<code>([\s\S]*)<\/code>/

My goal is to get everything in between the code brackets. Unfortunately, it's matching up to the 2nd closing code bracket Is there a way to match everything inside the code brackets up until the first occurrence of ending code bracket?

Jack Slingerland
  • 2,651
  • 5
  • 34
  • 56
  • 2
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Reactormonk Oct 22 '12 at 18:39

2 Answers2

4

All repetition quantifiers in regular expressions are greedy by default (matching as many characters as possible). Make the * ungreedy, like this:

/<code>([\s\S]*?)<\/code>/

But please consider using a DOM parser instead. Regex is just not the right tool to parse HTML.

Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • Don't worry, I'm only doing this as an exercise. If I really needed to parse html I'd be using a DOM parser. – Jack Slingerland Oct 22 '12 at 18:43
  • @JackSlingerland in this case, you shall be forgiven ;). [Here](http://www.regular-expressions.info/) is a really good tutorial on regular expressions with [this article](http://www.regular-expressions.info/repeat.html) applying specifically to your problem. – Martin Ender Oct 22 '12 at 18:48
0

And I just learned that for going through multiple parts, the

String.scan( /<code>(.*?)<\/code>/ ){
  puts $1
}

is a very nice way of going through all occurences of code - but yes, getting a proper parser is better...

ineiti
  • 324
  • 3
  • 11