1

Why does the CSS selector return the correct info, but the XPath does not?

source = "<hgroup class='page-header channel post-head' data-channel='tech' data-section='sec0=tech&amp;sec1=index&amp;sec2='><h2>Tech</h2></hgroup>"

doc = Nokogiri::HTML(source)
doc.xpath('//hgroup[case_insensitive_equals(@class,"post-head")]//h2', XpathFunctions.new)
 => [] 

doc.css("hgroup.post-head")[0].css("h2")
 => [#<Nokogiri::XML::Element:0x6c2b824 name="h2" children=[#<Nokogiri::XML::Text:0x6c2b554 "Tech">]>] 
Jordan Running
  • 102,619
  • 17
  • 182
  • 182
Henley
  • 21,258
  • 32
  • 119
  • 207

1 Answers1

1

Assuming case_insensitive_equals does what its name suggests, it is because the class attribute isn’t equal to post-head (case insensitively or not), but it does contain it. XPath treats class attributes as plain strings, it doesn’t split them and handle the classes individually as CSS does.

A simple XPath that would work would be:

doc.xpath('//hgroup[contains(@class, "post-head")]//h2')

(I’ve removed the custom function, you will need to write your own to do this case insensitively.)

This isn’t quite the same though, as it will also match classes such as not-post-head. A more complete XPath would be something like this:

doc.xpath('//hgroup[contains(concat(" ", normalize-space(@class), " "), " post-head ")]//h2')
Community
  • 1
  • 1
matt
  • 78,533
  • 8
  • 163
  • 197