4

I'm looking for a forgiving HTML parser for scraping HTML and extracting data in Ruby. I've had success using BeautifulSoup for this - what is the ruby equivalent?

Adam Loving
  • 443
  • 5
  • 12

2 Answers2

6

Nokogiri

Also see: Nokogiri vs Hpricot before making a choice. Nokogiri seems to outdo hpricot performance-wise (haven't benchmarked myself) and has a nice syntax IMO.

Community
  • 1
  • 1
Uku Loskit
  • 40,868
  • 9
  • 92
  • 93
  • Thank you. I used Nokogiri and it was sufficient for my purposes. I think the HTML I through at it was well-formed, so I have researched how fault tolerant it is. – Adam Loving Sep 16 '10 at 17:34
  • 2
    Update for 2013: the Hpricot readme on github says it is no longer maintained and recommends Nokogiri instead. – antinome Jan 09 '13 at 02:33
0

There was a Rubyful Soup gem, which was a Ruby port of BeautifulSoup, but it's no longer maintained and their site now recommends hpricot.

Daniel Vandersluis
  • 91,582
  • 23
  • 169
  • 153