I want to clarify that there is no officially supported way to keep the CDATA
section.
You could check the issue here.
Based on the above facts, you need DIY. There are two approaches:
Firstly, let's create some helper functions.
def cdata(s):
return '<![CDATA[' + s + ']]>'
def preprocessor(key, value):
'''Unneccessary if you've manually wrapped the values. For example,
xmltodict.unparse({
'node1': {'node2': '<![CDATA[test]]>', 'node3': 'test'}
})
'''
if key in KEEP_CDATA_SECTION:
if isinstance(value, dict) and '#text' in value:
value['#text'] = cdata(value['#text'])
else:
value = cdata(value)
return key, value
- Unescaping the escaped XML
import xmltodict
from xml.sax.saxutils import unescape
KEEP_CDATA_SECTION = ['node2']
out_xml = xmltodict.unparse(data, preprocessor=preprocessor)
out_xml = unescape(out_xml) # not safe !
You shall not try it on the untrusted data, cuz this approach not only unescapes the character data but also unescapes the nodes' attributes.
- Subclassing
XMLGenerator
To alleviate the safety problem of unescape()
, we could remove the escape()
call in XMLGenerator
so that there is no need to unescape the XML again.
class XMLGenerator(xmltodict.XMLGenerator):
def characters(self, content):
if content:
self._finish_pending_start_element()
self._write(content) # also not safe, but better !
xmltodict.XMLGenerator = XMLGenerator
It is not a hack, so it won't change the rest behavior of xmltodict
other than unparse()
. More importantly, it won't pollute the built-in library xml
.
For one-line fans.
xmltodict.XMLGenerator.characters = xmltodict.XMLGenerator.ignorableWhitespace # now, it is a hack !
Even more, you can wrap the character data directly in XMLGenerator
like the following.
class XMLGenerator(xmltodict.XMLGenerator):
def characters(self, content):
if content:
self._finish_pending_start_element()
self._write(cdata(content))
From now on, every nodes having character data will keep the CDATA
section.