To match the <div>
when it's all on one line, use:
/<div[^>]*>/
But, that will break on any markup with a new-line inside the tag. It'll also break if there is whitespace between <
and div
, which there could be.
Eventually, after you've added in all the extra checks for the possible ways a tag can be written you'll want to consider a better way, which would be to use a parser, like Nokogiri, which makes working with HTML and XML much easier.
For instance, since you're trying to tear apart the HTML:
<div>
<p>test content</p>
</div>
it's pretty easy to guess you really want to get to "test content". What if the HTML changed to:
<div><p>test content</p></div>
or worse:
<div
><p>
test
content
</div>
A browser won't care, nor will a good parser, but a regex will get upset and require rework.
require 'nokogiri'
require 'pp'
doc = Nokogiri.HTML(<<EOT)
<div
><p>
test
content
</div>
EOT
pp doc.at('p').text.strip.gsub(/\s+/, ' ')
# => "test content"
That's why we recommend parsers.