Method to parse HTML document in Ruby?

Question

like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.

score 49 · Accepted Answer · edited Sep 10 '12 at 05:40

49

There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.

Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers

edited Sep 10 '12 at 05:40

the Tin Man

158,662
42
215
303

answered Mar 31 '10 at 17:16

Marc-André Lafortune

78,216
16
166
166

score 9 · Answer 2 · answered Mar 31 '10 at 17:04

9

You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.

answered Mar 31 '10 at 17:04

Peter

127,331
53
180
211

2

Hpricot sadly is no more. Nokogiri is now the preferred solution. – superluminary Oct 14 '13 at 11:27

dineshsprabu · Answer 3 · 2017-02-11T08:02:20.873

6

Ruby Cheerio - A jQuery style HTML parser in ruby. A most simplified version of Nokogiri for crawlers. This is the ruby version of most popular NodeJS package cheerio.

Follow the link for a simple crawler example.

gem install ruby-cheerio

require 'ruby-cheerio'

jQuery = RubyCheerio.new("<html><body><h1 class='one'>h1_1</h1><h1>h1_2</h1></body></html>")

jQuery.find('h1').each do |head_one|
    p head_one.text
end

# getting attribute values like jQuery.
p jQuery.find('h1.one')[0].prop('h1','class')

# function chaining similar to jQuery.
p jQuery.find('body').find('h1').first.text

edited Feb 11 '17 at 08:02

answered Feb 08 '17 at 16:42

dineshsprabu

165
3
4

Very good approach! Nice recommendation! Thanks @dineshsprabu. – Fernando Kosh Apr 18 '17 at 19:22
Thanks Fernando Kosh – dineshsprabu Apr 19 '17 at 07:29

score 5 · Answer 4 · answered Aug 06 '15 at 14:04

5

You can also try Oga by Yorick Peterse.

It is an XML/HTML parser written in Ruby that does not require system libraries such as libxml. You can find it here. https://github.com/YorickPeterse/oga

answered Aug 06 '15 at 14:04

microspino

7,693
3
48
49

Method to parse HTML document in Ruby?

4 Answers4

Linked

Related