3

I want to be able to edit existing XML config files via Python while preserving the formatting of the file and the comments in them so that its still human readable.

I will be updating existing XML elements and changing values as well as adding new XML elements to the file.

Available XML parsers such as ElementTree and lxml are great ways to edit XML files but you loose the original formatting(when adding new elements to the file) and comments that were in the file.

Using Regular expressions seems to be an option but I know that this is not recommended with XML.

So I'm looking for something along the lines of a Pythonic XML file editor. What is the best way to go about this? Thanks.

UDJ
  • 281
  • 2
  • 10
  • 1
    possible duplicate of [Python: Update XML-file using ElementTree while conserving layout as much as possible](http://stackoverflow.com/questions/9579700/python-update-xml-file-using-elementtree-while-conserving-layout-as-much-as-pos) – Cory Kramer Jun 27 '14 at 14:12

2 Answers2

0

I recommend you to parse the XML document using a SAX parser, this gives you great flexibility to make your changes and to write back the document as it was.

Take a look at the xml.sax modules (see Python's documentation).

mguijarr
  • 7,641
  • 6
  • 45
  • 72
0

I recently wrote a class using jinja2 to format xml parsed by lxml to a specific format. If you can codify the actual format of the xml document, you may be able to modify it for your needs:

class XMLWriter:
    def __init__(self):
        self.env = jinja2.Environment()
        self.env.filters['depth'] = lambda node: len(list(node.iterancestors('*')))
        self.env.filters['is_comment'] = lambda node: node.tag is etree.Comment
        self.template = """<?xml version="1.0" encoding="utf-8"?>
{%- for node in rootnode recursive -%}
  {{- '\n' + '  '*node|depth -}}
  {%- if node|is_comment -%}
    {{- node -}}
  {%- else -%}
    <{{- node.tag -}}
    {%- for key,value in node.attrib.iteritems() -%}
      {{ '\n  ' + '  '*node|depth }}{{ key }}="{{ value }}"
    {%- endfor -%}
    {% if node|count %}>{% endif %}
    {{- loop(node) -}}
    {% if node|count %}{{ '\n' + '  '*node|depth }}</{{ node.tag }}>{% else %}/>{%- endif -%}
  {%- endif -%}
{%- endfor -%}"""

    def __call__(self, rootnode):
        return self.serialize(rootnode)
    def serialize(self, rootnode):
        return self.env.from_string(self.template).render(rootnode=[rootnode])

You use the class like this:

from lxml import etree
root = etree.fromstring(xml_to_parse, parser=etree.ETCompatXMLParser(remove_comments=False))
# do any modifications you like
writer = XMLWriter()
formatted_result = writer(root)
dustyrockpyle
  • 3,184
  • 17
  • 12