[Python] I'm trying to retrieve any element in an XML document that has an href
attribute, at any level of the XML document. For example:
<OuterElement href='a.com'>
<InnerElement>
<NestedInner href='b.com' />
<NestedInner href='c.com' />
<NestedInner />
</InnerElement>
<InnerElement href='d.com'/>
</OuterElement>
Would retrieve the following elements (as lxml element objects,simplified for visual clarity):
[<OuterElement href='a.com'>, <NestedInner href='b.com' />, <NestedInner href='c.com' />, <InnerElement href='d.com'/>]
I've tried using the following code to retrieve any element with an href tag, but it retrieves zero elements on a file full of elements with href attributes:
with(open(file, 'rb')) as f:
xml_tree = etree.parse(f)
href_elements = xml_tree.xpath(".//*[@href]")
Shouldn't this code select any element (.//*
) with the specified attribute ([@href]
)? From my understanding (definitely correct me if I am wrong, I most likely am), href_elements
should be an array of lxml element objects that each have an href attribute.
important clarification: I have seen many people asking about xpath on Stack Overflow, but I have yet to find a solved question about how to search through all elements in an xml and retrieve every element that fits a criteria (such as href).