1

I have a xml, in where an element contains multiple text nodes. Using python2 etree, I want to navigate the tree with the same order.

So, for this input:

<body>
  hello
  <b>world</b>
  bye
</body>

I need to be able to produce this output in this exact order:

tag: body
   text: hello
   tag: b
       text: world
   text: bye

However, I do not see in etree a function to iterate on both elements and text nodes.

How can I do that? I am looking for something such as (the function iterateElementsAndTextNodes does not exists):

from lxml import etree
import utils

doc = etree.XML("""<body>hello<b>world</b>bye</body>""")

def printNode(node, prefix):
    if isinstance(node, str):
        print prefix + "text: " + node
    else:
        print prefix + "tag:" + node.tag
        for c in node.iterateElementsAndTextNodes():
            printNode(c, prefix + "   ")

printNode(doc, "")
David Portabella
  • 12,390
  • 27
  • 101
  • 182
  • Possible duplicate of [Efficient way to iterate throught xml elements](http://stackoverflow.com/questions/4695826/efficient-way-to-iterate-throught-xml-elements) – stovfl Apr 07 '17 at 17:30

1 Answers1

1

We can use child::node() in xpath to select all the children of the context node, whatever their node type. Read about it here. So, changing the for loop to:

for c in node.xpath("child::node()"):
    printNode(c, prefix + "   ")

Code:

from lxml import etree
import utils

doc = etree.XML("""<body>hello<b>world</b>bye</body>""")
#print "doc is", etree.tostring(doc)
def printNode(node, prefix):
    if isinstance(node, etree._ElementStringResult):
        print prefix + "text: " + node
    else:
        print prefix + "tag: " + node.tag
        for c in node.xpath("child::node()"):
            printNode(c, prefix + "   ")
printNode(doc, "")
Community
  • 1
  • 1
devautor
  • 2,506
  • 4
  • 21
  • 31