I'd like to flatten an lxml etree (specifically, HTML, if it matters.) How would I go about getting a flat list of all elements in the tree?
Asked
Active
Viewed 9,874 times
12
-
http://lxml.de/tutorial.html#tree-iteration – Robᵩ Oct 06 '14 at 19:47
-
possible duplicate of [How to get all sub-elements of an element tree with Python ElementTree?](http://stackoverflow.com/questions/10408927/how-to-get-all-sub-elements-of-an-element-tree-with-python-elementtree) – Cory Kramer Oct 06 '14 at 19:47
-
quit voting to close. i need complete, recursive listing of all elements. i.e. tree.flatten(). – Walrus the Cat Oct 06 '14 at 20:00
1 Answers
18
You can use the .iter()
method, like so:
from lxml import etree
xml = etree.XML('''<html><body>
<p>hi there</p><p>2nd paragraph</p>
</body></html>''')
# If you want to visit all of the descendants
for element in xml.iter():
print element.tag
# Or, if you want to have a list of all the descendents
all_elements = list(xml.iter())
print [element.tag for element in all_elements]

Robᵩ
- 163,533
- 20
- 239
- 308
-
2accepting for list comprehension: elements = [ element for element in tree.iter()]. actually, more elegant, is list(tree.iter()) . – Walrus the Cat Oct 06 '14 at 20:02