Your code does work for me — it returns a list of <a>
. If you want a list of href
s not the element itself, add /@href
:
hrefs = tree.xpath('//a[i/@class="foobar"]/@href')
You could also first find the <i>
s, then use /parent::*
(or simply /..
) to get back to the <a>
s.
hrefs = tree.xpath('//a/i[@class="foobar"]/../@href')
# ^ ^ ^
# | | obtain the 'href'
# | |
# | get the parent of the <i>
# |
# find all <i class="foobar"> contained in an <a>.
If all of these don't work, you may want to verify if the structure of the document is correct.
Note that XPath won't peek inside comments <!-- -->
. If the <a>
is indeed inside the comments <!-- -->
, you need to manually extract the document out first.
hrefs = [href for comment in tree.xpath('//comment()')
# find all comments
for href in lxml.html.fromstring(comment.text)
# parse content of comment as a new HTML file
.xpath('//a[i/@class="foobar"]/@href')
# read those hrefs.
]