Ruby Regular Expressions: Matching if substring doesn't exist

Question

I'm having an issue trying to capture a group on a string:

"type=gist\nYou need to gist this though\nbecause its awesome\nright now\n</code></p>\n\n<script src=\"https://gist.github.com/3931634.js\"> </script>\n\n\n<p><code>Not code</code></p>\n"

My regex currently looks like this:

/<code>([\s\S]*)<\/code>/

My goal is to get everything in between the code brackets. Unfortunately, it's matching up to the 2nd closing code bracket Is there a way to match everything inside the code brackets up until the first occurrence of ending code bracket?

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Reactormonk, Oct 22 '12 at 18:39

score 4 · Accepted Answer · answered Oct 22 '12 at 18:38

4

All repetition quantifiers in regular expressions are greedy by default (matching as many characters as possible). Make the * ungreedy, like this:

/<code>([\s\S]*?)<\/code>/

But please consider using a DOM parser instead. Regex is just not the right tool to parse HTML.

answered Oct 22 '12 at 18:38

Martin Ender

43,427
11
90
130

Don't worry, I'm only doing this as an exercise. If I really needed to parse html I'd be using a DOM parser. – Jack Slingerland Oct 22 '12 at 18:43
@JackSlingerland in this case, you shall be forgiven ;). [Here](http://www.regular-expressions.info/) is a really good tutorial on regular expressions with [this article](http://www.regular-expressions.info/repeat.html) applying specifically to your problem. – Martin Ender Oct 22 '12 at 18:48

score 0 · Answer 2 · answered Oct 22 '12 at 21:20

0

And I just learned that for going through multiple parts, the

String.scan( /<code>(.*?)<\/code>/ ){
  puts $1
}

is a very nice way of going through all occurences of code - but yes, getting a proper parser is better...

answered Oct 22 '12 at 21:20

ineiti

324
3
11

Ruby Regular Expressions: Matching if substring doesn't exist

2 Answers2