Ruby Regex Web Parsing

Question

I am working on a simple Ruby script to parse the names of race horses from a webpage. This Regex works on http://rubular.com/, but my script does not print anything when I run it.

require 'open-uri';

url = "http://www.bloodhorse.com/horse-racing/race/race-results";
connection = open(url);
content = connection.read;

if(content =~ /(<span class="horseName">)(\n)(.*?)(\>)(.*?)(<\/a>)/)
    print $5,"\n";
end

An example of some of the page's source is:

<li value="2">
<span class="horseName">
<a href="/horse-racing/thoroughbred/felonious-fred/2010">Felonious Fred</a>

So I would think that my script should return the 5th capture of the matching Regex, which in this case should be "Felonious Fred". What am I doing wrong?

I feel it necessary to link this immortal answer from the Java section of SO : http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — mcfinnigan, Oct 23 '13 at 10:10

score 0 · Answer 1 · answered Oct 23 '13 at 10:08

0

If you are scrapping a webpage, I suggest you use Nokogiri gem. Will save you the trouble of Regex.

answered Oct 23 '13 at 10:08

JunaidKirkire

878
1
7
17

Ruby Regex Web Parsing

1 Answers1