28

Given:

require 'rubygems'
require 'nokogiri'
value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
  <p id='para-1'>A</p>
  <div class='block' id='X1'>
    <h1>Foo</h1>
    <p id='para-2'>B</p>
  </div>
  <p id='para-3'>C</p>
  <h2>Bar</h2>
  <p id='para-4'>D</p>
  <p id='para-5'>E</p>
  <div class='block' id='X2'>
    <p id='para-6'>F</p>
  </div>
</body>
</html>"
HTML_END

I want to do something like what I can do in Hpricot:

divs = value.search('//div[@id^="para-"]')
  1. How do I do a pattern search for elements in XPath style?
  2. Where would I find the documentation to help me? I didn't see this in the rdocs.
Jason Swett
  • 43,526
  • 67
  • 220
  • 351
bcolfer
  • 639
  • 1
  • 6
  • 15
  • 1
    PSA: For those attempting more complex regex, this is likely what you're looking for: http://stackoverflow.com/questions/649963/nokogiri-searching-for-div-using-xpath – DreadPirateShawn Feb 01 '15 at 06:19

4 Answers4

75

Use the xpath function starts-with:

value.xpath('//p[starts-with(@id, "para-")]').each { |x| puts x['id'] }
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Aaron Patterson
  • 2,275
  • 1
  • 20
  • 11
19
divs = value.css('div[id^="para-"]')
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
insane.dreamer
  • 2,052
  • 1
  • 17
  • 21
3

And some docs you're seeking:

andre-r
  • 2,685
  • 19
  • 23
1
Nokogiri::XML::Node.send(:define_method, 'xpath_regex') { |*args|
  xpath = args[0]
  rgxp = /\/([a-z]+)\[@([a-z\-]+)~=\/(.*?)\/\]/
  xpath.gsub!(rgxp) { |s| m = s.match(rgxp); "/#{m[1]}[regex(.,'#{m[2]}','#{m[3]}')]" }
  self.xpath(xpath, Class.new {
    def regex node_set, attr, regex
      node_set.find_all { |node| node[attr] =~ /#{regex}/ }
    end
  }.new)
}

Usage:

divs = Nokogiri::HTML(page.root.to_html).
  xpath_regex("//div[@class~=/axtarget$/]//div[@class~=/^carbo/]")
karwan
  • 252
  • 4
  • 15