-2

I'm trying to build a xml file in python so I can write it out to a file, but I'm getting complications with new lines and tabbing etc...

I cannot use a module to do this - because Im using a cut down version of python 2. It must all be in pure python.

For instance, how is it possible to create a xml file with this type of formatting, which keeps all the new lines and tabs (whitespace)?

e.g.

<?xml version="1.0" encoding="UTF-8"?>
<myfiledata>
    <mydata>
            blahblah
    </mydata>
</myfiledata>

I've tried enclosing each line

'    <myfiledata>' +\n
'                blahblah' +\n

etc.

However, the output Im getting from the script is not anything close to how it looks in my python file, there is extra white space and the new lines arent properly working.

Is there any definitive way to do this? I would rather be editing a file that looks somewhat like what I will end up with - for clarity sake...

Ke.
  • 2,484
  • 8
  • 40
  • 78
  • Have a look at the [```lxml```](https://pypi.python.org/pypi/lxml) package and see if it can help you out. – wwii Mar 26 '16 at 16:35
  • Sorry I updated the answer, I cant use lxml, its gotta be pure python – Ke. Mar 26 '16 at 16:36
  • 2
    You probably want a multi-line string. See this http://stackoverflow.com/questions/2504411/proper-indentation-for-python-multiline-strings for some ideas. – Xiongbing Jin Mar 26 '16 at 16:38
  • Can you use `xml.etree.ElementTree` from the standard library? – alecxe Mar 26 '16 at 16:40
  • Then maybe [string formatting](https://docs.python.org/3/library/string.html#format-string-syntax) will help. – wwii Mar 26 '16 at 16:41

2 Answers2

1

You can use XMLGenerator from saxutils to generate the XML and xml.dom.minidom to parse it and print the pretty xml (both modules from standard library in Python 2).

Sample code creating a XML and pretty-printing it:

from __future__ import print_function
from xml.sax.saxutils import XMLGenerator
import io
import xml.dom.minidom

def pprint_xml_string(s):
    """Pretty-print an XML string with minidom"""
    parsed = xml.dom.minidom.parse(io.BytesIO(s))
    return parsed.toprettyxml()

# create a XML file in-memory:
fp = io.BytesIO()
xg = XMLGenerator(fp)

xg.startDocument()
xg.startElement('root', {})

xg.startElement('subitem', {})
xg.characters('text content')
xg.endElement('subitem')

xg.startElement('subitem', {})
xg.characters('text content for another subitem')
xg.endElement('subitem')

xg.endElement('root')
xg.endDocument()

# pretty-print it
xml_string = fp.getvalue()
pretty_xml = pprint_xml_string(xml_string)
print(pretty_xml)

Output is:

<?xml version="1.0" ?>
<root>
    <subitem>text content</subitem>
    <subitem>text content for another subitem</subitem>
</root>

Note that the text content elements (wrapped in <subitem> tags) aren't indented because doing so would change their content (XML doesn't ignore whitespace like HTML does).

martineau
  • 119,623
  • 25
  • 170
  • 301
Elias Dorneles
  • 22,556
  • 11
  • 85
  • 107
0

The answer was to use xml.element.tree and from xml.dom import minidom

Which are all available on python 2.5

Ke.
  • 2,484
  • 8
  • 40
  • 78