0

Take a look at this page: http://finviz.com/quote.ashx?t=AAPL

The XPath

(//*[contains(concat(' ', @class, ' '), ' snapshot-table2 ')]/tbody/tr/td[text() = 'Index']/following::td)[1]/b

should spit out S&P 500, and when I try it out in the JavaScript console, it does. However, when I try using Nokogiri with Ruby, it returns an empty array.

Here's the full code:

url = "http://finviz.com/quote.ashx?t=#{ticker}"
data = Nokogiri::HTML(open(url))
xpath = "(//*[contains(concat(' ', @class, ' '), ' snapshot-table2 ')]/tbody/tr/td[text() = '#{datapoint}']/following::td)[1]/b"
data.xpath(xpath)

data.xpath(xpath) returns []

Any ideas?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
chintanparikh
  • 1,632
  • 6
  • 22
  • 36
  • You can't trust the browser to tell you the correct path to a node. Browsers try to present something usable, and resort to all sorts of fix-ups with malformed HTML. That "fixed" HTML is what it will show you. Instead, look at the same HTML that Nokogiri will see when your `open` statement hands off the IO handle. Use `open(url).read` and you'll see the raw HTML as sent by the server, which will give you a better chance to find the right XPath. – the Tin Man Jul 22 '14 at 23:15

1 Answers1

2

In the response of that page, I don't see a <tbody> element.

You mentioned your XPath works in your browser console, this is likely because the browser is injecting the tbody element for rendering purposes, see "Why do browsers insert tbody element into table elements?" for detail on why this happens.

Try your XPath again without specifying the <tbody> node:

data.xpath("(//*[contains(@class, 'snapshot-table2')]/tr/td[text() = 'Index']/following::td)[1]/b")

You could also use something like this to get the same XPath to work in both the browser and your Ruby code:

(//*[contains(@class, 'snapshot-table2')]//tr/td[text() = 'Index']/following::td)[1]/b
Community
  • 1
  • 1
alol
  • 1,152
  • 9
  • 7
  • The browser adds `tbody` because it's trying to follow the spec. Whether the HTML had `` originally isn't important to the browser, but it is to a parser. – the Tin Man Jul 22 '14 at 23:17