1

I have this string:

"lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet"

I need to extract the text between <tt>...</tt> into an array, I've tried with:

"lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet".scan(/<tt>(.*)<\/tt>/)

but with no luck...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
zambetta
  • 11
  • 2
  • Great! The Cthulhu-prevention squad was hoping to enjoy a Christmas break, and now you come along! http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andrew Grimm Dec 27 '10 at 00:48

2 Answers2

4

It is so much better to use a parser, even with a tiny fragment, unless you are sure the string will never change format and you own the process from end-to-end.

That said, to meet your requirement of a regex, I'd use String.scan:

str = "lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet"

str.scan(%r{<tt>([^<]+)</tt>}).flatten # => ["text1", "text2"]

Just to show how simple using a parser is:

require 'nokogiri'
doc = Nokogiri::HTML(str)
doc.css('tt').map(&:text) # => ["text1", "text2"]

The benefit is flexibility and robustness.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
2

Try .scan(/<tt>(.*?)<\/tt>/)

Here *? is so-called 'reluctant quantifier'.

s = "lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet"
puts s.scan(/<tt>(.*?)<\/tt>/).inspect #  => [["text1"], ["text2"]]
Nikita Rybak
  • 67,365
  • 22
  • 157
  • 181