I now use lxml
module to generate XML file by Python.
We must define some entity references to be parsed in our external system. Normally, all text string of elements are escaped on output to XML string:
from lxml import etree
root = etree.Element("root")
sub = etree.Element("sub")
sub.text = "&entity;text"
root.append(sub)
print etree.tostring(root)
'<root><sub>&entity;text</sub></root>' # I want to get without escaping
I found lxml.etree.Entity
class is useful for this purpose.:
root = etree.Element("root")
sub = etree.Element("sub")
entity = etree.Entity("entity")
entity.tail = "text"
sub.append(entity)
root.append(sub)
print etree.tostring(root)
'<root><sub>&entity;text</sub></root>'
However, if we set text with entity reference to value of attribute, it fails:
root = etree.Element("root")
sub = etree.Element("sub")
entity = etree.Entity("entity")
entity.tail = "text"
sub.attrib["foo"] = entity
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-62cb8ef3a9a6> in <module>()
----> 1 sub.attrib["foo"] = entity
lxml.etree.pyx in lxml.etree._Attrib.__setitem__ (src/lxml/lxml.etree.c:58775)()
apihelpers.pxi in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:19025)()
apihelpers.pxi in lxml.etree._utf8 (src/lxml/lxml.etree.c:26460)()
TypeError: Argument must be bytes or unicode, got '_Entity'
What I want to get is like:
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE foo [
<!ENTITY ent "entity" >
<!ENTITY aaa "aaaaaa" >
]>
<foo>
<sub bar="&ent;bas">&aaa;bbb</sub>
<foo>
How can we define generator for that?