0

I am trying to parse websites and get the XPath for specific pieces of content.

For example: On www.stackoverflow.com I want to get the XPath to the 'Questions' button. Using a Chrome extension I find that the following XPath can grab 'Questions':

/html/body[@class='home-page new-topbar']/div[@class='container']/div[@id='header']/div[@id='hmenus']/div[@class='nav mainnavs']/ul/li[1]/a[@id='nav-questions']

Now I want to know is a way to programmatically get the XPath for a given piece of content on a webpage?

dfsq
  • 191,768
  • 25
  • 236
  • 258
Namaskara
  • 9
  • 3
  • see http://stackoverflow.com/questions/5046174/get-xpath-from-the-org-w3c-dom-node – Urban48 Jan 10 '15 at 09:01
  • Talking of "the XPath" suggests there is a unique answer. This is not the case. Your example XPath would not select unique nodes for al documents, e.g. if there are two div's with class="container". You need to specify your requirements more precisely. In particular, the problem becomes significantly more difficult when namespaces are involved. You might like to look at the spec of the fn:path() function in XPath 3.0: http://www.w3.org/TR/xpath-functions-30/#func-path – Michael Kay Jan 10 '15 at 10:03
  • To start off I can assume that the text I am searching for is unique and will appear in the document only once. – Namaskara Jan 10 '15 at 15:33

0 Answers0