0

Last week I started to write a script in Ruby. I needed to scrape some data from the web so I was told to use Mechanize and then Nokogiri.

The Mechanize documentation says:

Mechanize uses nokogiri to parse html. What does this mean for you? You can treat a mechanize page like an nokogiri object. After you have used Mechanize to navigate to the page that you need to scrape, then scrape it using nokogiri methods.

I know that I can use xpath or at_xpath because it was part of "How do I parse an HTML table with Nokogiri?" but I do not know the exact syntax of these methods, the difference etc.

I was told in "how to use nokogiri methods .xpath & .at_xpath" that

I often use text() expression. This is not required using Nokogiri. You can retrieve the node then call the text method on the node. It's much less expensive.

I tried to search Nokogiri's documentation but didn't find anything on that.

Is out there somebody who can help me read Nokogiri's documentation?

I want to know how to use the text method instead of text().

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Radek
  • 13,813
  • 52
  • 161
  • 255

2 Answers2

2

I am not really sure what the problem is when reading Nokogiri documentation. A quick search for "nokogiri" on Google returns "nokogiri.org" as the first hit. That is the documentation page.

In Ruby, text() is the same as text if you are not passing parameters. text() is an alias for inner_text(), which will

Get the inner text of all contained Node objects

Searching nokogiri.org for "text" will get you started.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
1

I think one of the things the author means is that the documentation on the site is not in the standard format/display as other sites that use rdoc and various methods to show information. E.G. it is hard to read.

To answer, or try to - I've had luck searching around github for projects that use nokogiri and going from there by reading the source.

sullivan.t
  • 121
  • 1
  • 6