Why is this Regex result unexpected

Question

The regex in question is

/(<iframe.*?><\/iframe>)/

I am using this ruby regex to match sections of a string then creating an array of the results.

The string is

"<p><iframe src=\"http://www.dailymotion.com/embed/video/k18WBkRTMldXzB7JYW5?logo=0&#038;info=0\" frameborder=\"0\" height=\"450\" width=\"580\"></iframe></p>\n<p>#1<br />\n<iframe src=\"https://www.cloudy.ec/embed.php?id=cabe5d3ba31da\" allowfullscreen=\"\" frameborder=\"0\" height=\"420\" width=\"640\"></iframe></p>\n<p>#2<br />\n<iframe src=\"https://www.cloudy.ec/embed.php?id=b03d31e4b5663\" allowfullscreen=\"\" frameborder=\"0\" height=\"420\" width=\"640\"></iframe></p>\n<p>#3<br />\n<iframe src=\"https://www.cloudy.ec/embed.php?id=f63895add1aac\" allowfullscreen=\"\" frameborder=\"0\" height=\"420\" width=\"640\"></iframe></p>\n"

I am calling the regex is .match() like so

/(<iframe.*?><\/iframe>)/.match(entry.content).to_a

The result is a duplicate of the first match

["<iframe src=\"http://www.dailymotion.com/embed/video/k18WBkRTMldXzB7JYW5?logo=0&#038;info=0\" frameborder=\"0\" height=\"450\" width=\"580\"></iframe>", "<iframe src=\"http://www.dailymotion.com/embed/video/k18WBkRTMldXzB7JYW5?logo=0&#038;info=0\" frameborder=\"0\" height=\"450\" width=\"580\"></iframe>"]

I used Rubular and I was able to get the Regex to work there http://rubular.com/r/CYF0vgQtrX

It was the first approach I thought of. If this wasn't going to work, then I was going to use nokogiri — Patrick, May 23 '14 at 04:08

7stud · Accepted Answer · 2014-05-23T05:55:40.530

The result is a duplicate of the first match

Even though the docs for Regex#match() do a horrible job of describing what match() does, it actually finds the first match:

str = "abc"
md = /./.match(str)
p md.to_a

--output:--
["a"]

Regexp.match() returns a MatchData object when there is a match. A MatchData object contains matches for the whole match and for each group. If you call to_a() on a MatchData object, the return value is an Array containing the whole match and whatever matched each group in the regex:

str = "abc"
md = /(.)(.)(.)/.match(str)
p md.to_a

--output:--
["abc", "a", "b", "c"]

Because you specified a group in your regex, one result is the whole match, and the other result is what matched your group.

[A regex] was the first approach I thought of. If this wasn't going to work, then I was going to use nokogiri

From now on, nokogiri should be your first thought...because:

If you have a programming problem, and you think, "I'll use a regex", now you have two problems".

Green Su · Answer 2 · 2014-05-23T05:01:45.877

1

You should use scan instead of match here.

entry.content.scan(/<iframe.*?><\/iframe>/)

Using /(<iframe.*?><\/iframe>)/ will get a 2d array. The document says:

If the pattern contains groups, each individual result is itself an array containing one entry per group.

edited May 23 '14 at 05:01

answered May 23 '14 at 03:55

Green Su

2,318
2
22
16

Thanks it works. Whats your reasoning for using scan? Why wasn't `match` working? Also `scan` creates a 2d array instead of an array of strings. I used `flatten` but know of any other way around that. – Patrick May 23 '14 at 04:02

Why is this Regex result unexpected

2 Answers2