0

I'm wirting a script to scrape a webpage. My (debug) code is as follows (saved a local copy for faster testing):

doc = html.parse("C:\debug.html")
links = doc.xpath("//div[@class='download_item']")
print(links[0].xpath("//div[@class='download_title']"))
print(doc.xpath("//div[@class='download_title']")

For me unexpected, both printouts yield the same result (The one I would expect to search the whole document.) I verified that the "links" list contains the desired results. Now I want to search each element of links for the "download_title" class. There should be only one result per element. However it seems that the whole document is scanned. My question is why xpath does not search the element indicated (links[0]) but the whole document (doc) and how to search only the subelement links[0]

Menacer
  • 23
  • 1
  • 5
  • Read up on [XPath - Quick Guide](https://www.tutorialspoint.com/xpath/xpath_quick_guide.htm) section **XPath - Expression**. – stovfl Mar 08 '20 at 22:08
  • With trial and error I found that `links[0].xpath(".//div[@class='download_title']")` does what I want. I still don't understand why the dot before the // is neccessary. The guide says " // Selection starts from the current node that match the selection". So why does the search does not start in links[0] but in doc. – Menacer Mar 09 '20 at 04:23
  • ***"guide says " // Selection starts from the current node that match the selection"***: Confusing, I agree, the differents to `'.'` is ***"...that match the selection"***. Whatever that means. – stovfl Mar 09 '20 at 09:04
  • It is described more clearly in the XPath 1.0 recommendation (https://www.w3.org/TR/1999/REC-xpath-19991116/#path-abbrev). `//para` selects all the `para` descendants of the document root and thus selects all `para` elements in the same document as the context node. `.//para` selects the `para` element descendants of the context node. – mzjn Mar 09 '20 at 10:08
  • Thank you @mzjn. Very helpful link that I bookmarked now. – Menacer Mar 09 '20 at 10:17

0 Answers0