3

With lxml, how can I prepend processing instructions before the root element or append PIs after de root element with lxml.

Currently, the following example doesn't work:

from lxml import etree

root = etree.XML("<ROOT/>")
root.addprevious(etree.ProcessingInstruction("foo"))
print(etree.tounicode(root))

I get:

<ROOT/>

Instead of:

<?foo?><ROOT/>
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103

2 Answers2

2

You need to use ElementTree, not just Element in tounicode():

from lxml import etree

root = etree.XML("<ROOT/>")
root.addprevious(etree.ProcessingInstruction("foo"))
print(etree.tounicode(root.getroottree()))

Output is almost what you wanted:

<?foo ?><ROOT/>

Extra space character after foo showed up because lxml renders PI as pi.target + " " + pi.text.

Tupteq
  • 2,986
  • 1
  • 21
  • 30
  • I see people upvoting this answer, but the right answer is to use `etree.tounicode(root.getroottree())`: there is no need to create a new `ElementTree`. You may edit your answer so that I accept it. – Laurent LAPORTE Feb 14 '20 at 11:33
2

Actually, an Element is always attached to a ElementTree even if it looks "detached":

root = etree.XML("<ROOT/>")
assert root.getroottree() is not None

When we use addprevious/addnext to insert a processing instruction before/after a root element, the PIs are not attached to a parent element (there isn't any) but they are attached to the root tree instead.

So, the problem lies in the usage of tounicode (or tostring). The best practice is to print the XML of the root tree, not the root element.

from lxml import etree

root = etree.XML("<ROOT/>")
root.addprevious(etree.ProcessingInstruction("foo"))
root.addnext(etree.ProcessingInstruction("bar"))

print(etree.tounicode(root))
# => "<ROOT/>"

print(etree.tounicode(root.getroottree()))
# => "<?foo ?><ROOT/><?bar ?>"
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103