1

From the website https://www.google.co.uk/finance?cid=704531 I'm trying to import the fields "Range" and "52 week" and the others in that group in the middle. I've looked at XPath and am very confused by it.

In cell A1 I have https://www.google.co.uk/finance?cid=704531.

In cell B1 I have

=importxml(A1,//*[@id="market-data-div"]/div[2]/div[1]/table[1]/tbody/tr[4]/td[1])

But the result is an error.

What's the correct XPath expression for this HTML structure?

Rubén
  • 34,714
  • 9
  • 70
  • 166
user2531064
  • 11
  • 1
  • 3
  • Welcome on Stackoverflow! Please pay a little more attention on formatting and do not use all caps, it makes your question hard to read and understand and reduces chances somebody will answer it. The [faq] has some information on how to use Markdown for formatting your posts. Some kudos (and upvote) for trying yourself before asking! – Jens Erat Jun 28 '13 at 09:01
  • possible duplicate of [Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?](http://stackoverflow.com/questions/18241029/why-does-my-xpath-query-scraping-html-tables-only-work-in-firebug-but-not-the) – Jens Erat Aug 14 '13 at 21:04

1 Answers1

4

Dom and the <tbody/>

It seems you created that XPath expression within Firebug or similar developer tools. While HTML does not require <tbody/> tags, DOM does; and those developer tools work on the DOM and thus wrap the table rows in such an element. You can easily see the difference when looking at the page source (fetched by wget or other tools if necessary) and the structure presented by Firebug.

Just omit the /tbody axis step in your expression:

//*[@id='market-data-div']/div[2]/div[1]/table[1]/tr[4]/td[1]

Anyway, I'd rather go for fetching the cells by the description, which is a little weaker regarding changes on the wording, but much more robust to changes on the structure:

//*[@id='market-data-div']//tr[td = 'Vol / Avg.']/td[2]

Quoting the Expression

A second problem is you have to quote the XPath expression in double quotes, so for example use:

=importxml(A1,"//*[@id='market-data-div']//tr[td = 'Vol / Avg.']/td[2]")
Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • I have copied your "Vol / Avg." suggestion and appended it to =importxml(A1, So I have ended up with =importxml(A1,//*[@id="market-data-div"]//tr[td = "Vol / Avg."]/td[2]) Which makes sense, but I get a Parse Error in B1 – user2531064 Jun 28 '13 at 09:43
  • Oh, was too sure that the tbody was the problem so I actually never looked at how you embedded it in spreadsheets. You have to double quote the expression, so I switched to single quotes inside XPath. See my updates answer. – Jens Erat Jun 28 '13 at 09:55
  • Many thanks for helping me out with this. I have a result in cel B1, but some other text has come into it also. I was expecting to see "0.00/10.47M" but I have "367,913.00/10.47M" not sure where the "367,913" came from ... unless it's a more accurate figure ... May I ask if I wanted to change the formula to incorporate other fields, how do I change your formula to accept the other fields, such as "Shares" – user2531064 Jun 28 '13 at 10:04
  • I'm getting the idea, I've found an easier site to navigate through http://www.proactiveinvestors.co.uk/LON:SOLG/SolGold/financials This site has similar information, but is in an easier structure. I've used Chromes developer tool to locate the place where the information is, but now need to understand the xpath part. The xpath is /html/body/div[7]/div[3]/div/div/div[2]/div[2]/table[1]/tbody/tr[1]/td – user2531064 Jun 28 '13 at 10:14
  • Same thing again, remove the `/tbody` axis steps which Chrome inserted. – Jens Erat Jun 28 '13 at 10:31