How to extract values from an XML fragment using a Ruby regular expression

Question

I have this string:

"lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet"

I need to extract the text between <tt>...</tt> into an array, I've tried with:

"lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet".scan(/<tt>(.*)<\/tt>/)

but with no luck...

Great! The Cthulhu-prevention squad was hoping to enjoy a Christmas break, and now you come along! http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Andrew Grimm, Dec 27 '10 at 00:48

the Tin Man · Answer 1 · 2010-12-26T05:23:16.147

It is so much better to use a parser, even with a tiny fragment, unless you are sure the string will never change format and you own the process from end-to-end.

That said, to meet your requirement of a regex, I'd use String.scan:

str = "lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet"

str.scan(%r{<tt>([^<]+)</tt>}).flatten # => ["text1", "text2"]

Just to show how simple using a parser is:

require 'nokogiri'
doc = Nokogiri::HTML(str)
doc.css('tt').map(&:text) # => ["text1", "text2"]

The benefit is flexibility and robustness.

score 2 · Answer 2 · answered Dec 25 '10 at 21:02

2

Try .scan(/<tt>(.*?)<\/tt>/)

Here *? is so-called 'reluctant quantifier'.

s = "lorem <tt>text1</tt> ipsum <tt>text2</tt>dolor si amet"
puts s.scan(/<tt>(.*?)<\/tt>/).inspect #  => [["text1"], ["text2"]]

answered Dec 25 '10 at 21:02

Nikita Rybak

67,365
22
157
181

How to extract values from an XML fragment using a Ruby regular expression

2 Answers2