How to use Nokogiri's xpath and at_xpath methods

Question

I'm learning how to use Nokogiri and few questions came to me based on this code:

require 'rubygems'
require 'mechanize'

post_agent = WWW::Mechanize.new
post_page = post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')

puts "\nabsolute path with tbody gives nil"
puts  post_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]').xpath('text()').to_s.strip.inspect

puts "\n.at_xpath gives an empty string"
puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").at_xpath('text()').to_s.strip.inspect

puts "\ntwo lines solution with .at_xpath gives an empty string"
rows =   post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")
puts rows[0].at_xpath('text()').to_s.strip.inspect


puts
puts "two lines working code"
rows =   post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")
puts rows[0].xpath('text()').to_s.strip

puts "\none line working code"
puts post_page.parser.xpath("//div[@id='posts']/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip

puts "\nanother one line code"
puts post_page.parser.at_xpath("//div[@id='posts']/div/table/tr/td/div[2]").xpath('text()').to_s.strip

puts "\none line code with full path"
puts post_page.parser.xpath("/html/body/div/div/div/div/div/table/tr/td/div[2]")[0].xpath('text()').to_s.strip

Is it better to use // or / in XPath? @AnthonyWJones says that "the use of an unprefixed //" is not such a good idea.
I had to remove tbody from any working XPath otherwise I got a nil result. How is possible to remove an element from the XPath to get things to work?
Do I have to use xpath twice to extract data if not using a full XPath?
Why can't I make at_xpath work to extract data? It works nicely in "How do I parse an HTML table with Nokogiri?". What is the difference?

score 10 · Accepted Answer · edited Feb 07 '20 at 19:55

10

// means every node at every level so it's much more expensive compared to /.
You can use * as a placeholder.
No, you can make an XPath query, get the element then call Nokogiri's text method on the node.
Sure you can. Have a look at "What is the absolutely cheapest way to select a child node in Nokogiri?" and my benchmark file. You will see an example of at_xpath.

I found you often use the text() expression. This is not required using Nokogiri. You can retrieve the node then call the text method on the node. It's much less expensive.

Also keep in mind Nokogiri supports CSS selectors. They can be easier if you are working with HTML pages.

edited Feb 07 '20 at 19:55

the Tin Man

158,662
42
215
303

answered Jan 22 '10 at 20:29

Simone Carletti

173,507
49
363
364

@Simone Carletti: thak you for that. Maybe all my questions come because I do not know how to read documentation on http://nokogiri.org I do not know how to find anything about calling text method on the node. Would it be possible to write more about that. I already find my script bit slow it would be great to make it faster. – Radek Jan 22 '10 at 22:01
I found that a XPath placeholder is an real xpath expression. So what does it mean to use * as placeholder? – Radek Jan 22 '10 at 22:11
* means any node. For instance, in `/node/foo/one` and `/node/bar/one`, `/node/*/one` matches both paths. – Simone Carletti Feb 01 '10 at 10:19
could you tell me how to use text method? – Radek Feb 16 '10 at 06:12

How to use Nokogiri's xpath and at_xpath methods

1 Answers1

Linked