lxml: xpath search within subelement

Question

I'm wirting a script to scrape a webpage. My (debug) code is as follows (saved a local copy for faster testing):

doc = html.parse("C:\debug.html")
links = doc.xpath("//div[@class='download_item']")
print(links[0].xpath("//div[@class='download_title']"))
print(doc.xpath("//div[@class='download_title']")

For me unexpected, both printouts yield the same result (The one I would expect to search the whole document.) I verified that the "links" list contains the desired results. Now I want to search each element of links for the "download_title" class. There should be only one result per element. However it seems that the whole document is scanned. My question is why xpath does not search the element indicated (links[0]) but the whole document (doc) and how to search only the subelement links[0]

Read up on [XPath - Quick Guide](https://www.tutorialspoint.com/xpath/xpath_quick_guide.htm) section **XPath - Expression**. — stovfl, Mar 08 '20 at 22:08
With trial and error I found that `links[0].xpath(".//div[@class='download_title']")` does what I want. I still don't understand why the dot before the // is neccessary. The guide says " // Selection starts from the current node that match the selection". So why does the search does not start in links[0] but in doc. — Menacer, Mar 09 '20 at 04:23
***"guide says " // Selection starts from the current node that match the selection"***: Confusing, I agree, the differents to `'.'` is ***"...that match the selection"***. Whatever that means. — stovfl, Mar 09 '20 at 09:04
It is described more clearly in the XPath 1.0 recommendation (https://www.w3.org/TR/1999/REC-xpath-19991116/#path-abbrev). `//para` selects all the `para` descendants of the document root and thus selects all `para` elements in the same document as the context node. `.//para` selects the `para` element descendants of the context node. — mzjn, Mar 09 '20 at 10:08

lxml: xpath search within subelement

0 Answers0