12

I'd like to flatten an lxml etree (specifically, HTML, if it matters.) How would I go about getting a flat list of all elements in the tree?

Walrus the Cat
  • 2,314
  • 5
  • 35
  • 64
  • http://lxml.de/tutorial.html#tree-iteration – Robᵩ Oct 06 '14 at 19:47
  • possible duplicate of [How to get all sub-elements of an element tree with Python ElementTree?](http://stackoverflow.com/questions/10408927/how-to-get-all-sub-elements-of-an-element-tree-with-python-elementtree) – Cory Kramer Oct 06 '14 at 19:47
  • quit voting to close. i need complete, recursive listing of all elements. i.e. tree.flatten(). – Walrus the Cat Oct 06 '14 at 20:00

1 Answers1

18

You can use the .iter() method, like so:

from lxml import etree

xml = etree.XML('''<html><body>
                   <p>hi there</p><p>2nd paragraph</p>
                   </body></html>''')

# If you want to visit all of the descendants
for element in xml.iter():
    print element.tag

# Or, if you want to have a list of all the descendents
all_elements = list(xml.iter())
print [element.tag for element in all_elements]
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • 2
    accepting for list comprehension: elements = [ element for element in tree.iter()]. actually, more elegant, is list(tree.iter()) . – Walrus the Cat Oct 06 '14 at 20:02