1

I'm trying to write and read back a set of strings in elements called objects which has two attributes name (simple string ) and body the body is a string with special characters "\n" and "\" I'm using the following code for writiing the xml file :

from xml.dom.minidom import Document

doc = Document()
root = doc.createElement('data')
doc.appendChild(root)
#create a scene
scene = doc.createElement('scene')
root.appendChild(scene)
#add object element
object = doc.createElement('object')
object.setAttribute('name', 'obj1')

txt= 'Text\nsome text\nanother one\\and so on\n'
object.setAttribute('body',txt )
scene.appendChild(object)

#write to a file
file_handle = open("filename.xml","wb")
file_handle.write(bytes(doc.toprettyxml(indent='\t'), 'UTF-8'))
file_handle.close()

and it produces this file

<?xml version="1.0" ?>
<data>
    <scene>
        <object body="Text
some text
another one\and so on
" name="obj1"/>
    </scene>
</data>

and for parsing :

filepath = 'file.xml'
dom = minidom.parse(filepath)
scenes =dom.getElementsByTagName('scene')
for scene in scenes:
    txt_objs =scene.getElementsByTagName('object')
    for obj in txt_objs:
        obj_name = obj.getAttribute('name')
        obj_body = obj.getAttribute('body')
        print(obj_name,"  ",obj_body)

the output of the parser is not the same as the stored the newline special char is lost, how do i maintain the same output as the input

#parser output
obj1    Text some text another one\and so on

What is the proper way of storing and retrieving a string with special characters ?

Chebhou
  • 148
  • 1
  • 16
  • Take a look at [lxml](http://lxml.de). It's very easy to start and can save lot of effort in aspects. – Mauro Baraldi Apr 01 '15 at 03:35
  • @MauroBaraldi thanks for the suggestion , but i can't use external library ,this script is for blender_3d so i need to use what's included in python by default only – Chebhou Apr 01 '15 at 03:44

1 Answers1

1

That behavior demonstrated by minidom is align with W3C recommendation. See the following discussion: "Are line breaks in XML attribute values valid?". I quoted the answer by @JanCetkovsky here for easy reference :

It is valid, however according to W3C recommendation your XML parser should normalize the all whitespace characters to space (0x20) - so the output of your examples will differ (you should have new line on the output for " ", but only space in the first case). [Source]

If you have control over the XML document structure (seems you have as you constructed the XML yourself), put the text as XML element value instead of XML attribute value :

.....
#add object element
obj = doc.createElement('object')
obj.setAttribute('name', 'obj1')

txt = 'Text\nsome text\nanother one\\and so on\n'
txt_node = doc.createTextNode(txt)
obj.appendChild(txt_node)
scene.appendChild(obj)
.....
Community
  • 1
  • 1
har07
  • 88,338
  • 12
  • 84
  • 137