Editing XML as a dictionary in python?

Question

I'm trying to generate customized xml files from a template xml file in python.

Conceptually, I want to read in the template xml, remove some elements, change some text attributes, and write the new xml out to a file. I wanted it to work something like this:

conf_base = ConvertXmlToDict('config-template.xml')
conf_base_dict = conf_base.UnWrap()
del conf_base_dict['root-name']['level1-name']['leaf1']
del conf_base_dict['root-name']['level1-name']['leaf2']

conf_new = ConvertDictToXml(conf_base_dict)

now I want to write to file, but I don't see how to get to ElementTree.ElementTree.write()

conf_new.write('config-new.xml')

Is there some way to do this, or can someone suggest doing this a different way?

score 19 · Answer 1 · edited Feb 22 '21 at 13:33

This'll get you a dict minus attributes. I don't know, if this is useful to anyone. I was looking for an xml to dict solution myself, when I came up with this.


      
import xml.etree.ElementTree as etree

tree = etree.parse('test.xml')
root = tree.getroot()

def xml_to_dict(el):
  d={}
  if el.text:
    d[el.tag] = el.text
  else:
    d[el.tag] = {}
  children = el.getchildren()
  if children:
    d[el.tag] = map(xml_to_dict, children)
  return d

This: http://www.w3schools.com/XML/note.xml

<note>
 <to>Tove</to>
 <from>Jani</from>
 <heading>Reminder</heading>
 <body>Don't forget me this weekend!</body>
</note>

Would equal this:


{'note': [{'to': 'Tove'},
          {'from': 'Jani'},
          {'heading': 'Reminder'},
          {'body': "Don't forget me this weekend!"}]}

This was exactly what I was looking for. And use of `map` gets bonus points for me. Well done. — Eric Cope, Sep 12 '11 at 21:54

score 11 · Answer 2 · answered Sep 24 '08 at 14:58

I'm not sure if converting the info set to nested dicts first is easier. Using ElementTree, you can do this:

import xml.etree.ElementTree as ET
doc = ET.parse("template.xml")
lvl1 = doc.findall("level1-name")[0]
lvl1.remove(lvl1.find("leaf1")
lvl1.remove(lvl1.find("leaf2")
# or use del lvl1[idx]
doc.write("config-new.xml")

ElementTree was designed so that you don't have to convert your XML trees to lists and attributes first, since it uses exactly that internally.

It also support as small subset of XPath.

Might as well just use `find` on the `lvl1` assignment, rather than `findall` and getting the first element. — Xiong Chiamiov, Sep 03 '11 at 02:46

score 8 · Accepted Answer · edited Mar 19 '12 at 18:40

For easy manipulation of XML in python, I like the Beautiful Soup library. It works something like this:

Sample XML File:

<root>
  <level1>leaf1</level1>
  <level2>leaf2</level2>
</root>

Python code:

from BeautifulSoup import BeautifulStoneSoup, Tag, NavigableString

soup = BeautifulStoneSoup('config-template.xml') # get the parser for the xml file
soup.contents[0].name
# u'root'

You can use the node names as methods:

soup.root.contents[0].name
# u'level1'

It is also possible to use regexes:

import re
tags_starting_with_level = soup.findAll(re.compile('^level'))
for tag in tags_starting_with_level: print tag.name
# level1
# level2

Adding and inserting new nodes is pretty straightforward:

# build and insert a new level with a new leaf
level3 = Tag(soup, 'level3')
level3.insert(0, NavigableString('leaf3')
soup.root.insert(2, level3)

print soup.prettify()
# <root>
#  <level1>
#   leaf1
#  </level1>
#  <level2>
#   leaf2
#  </level2>
#  <level3>
#   leaf3
#  </level3>
# </root>

BeautifulSoup converts everything to lower case. That really sucks. I have to preserve the cases of tags and values! — user236215, May 19 '10 at 21:13
The author of BeautifulSoup says it does this because HTMLParser does it. "If you need to preserve tag case, try lxml". — nealmcb, Mar 21 '12 at 18:14

Mark · Answer 4 · 2011-05-22T16:14:17.707

My modification of Daniel's answer, to give a marginally neater dictionary:

def xml_to_dictionary(element):
    l = len(namespace)
    dictionary={}
    tag = element.tag[l:]
    if element.text:
        if (element.text == ' '):
            dictionary[tag] = {}
        else:
            dictionary[tag] = element.text
    children = element.getchildren()
    if children:
        subdictionary = {}
        for child in children:
            for k,v in xml_to_dictionary(child).items():
                if k in subdictionary:
                    if ( isinstance(subdictionary[k], list)):
                        subdictionary[k].append(v)
                    else:
                        subdictionary[k] = [subdictionary[k], v]
                else:
                    subdictionary[k] = v
        if (dictionary[tag] == {}):
            dictionary[tag] = subdictionary
        else:
            dictionary[tag] = [dictionary[tag], subdictionary]
    if element.attrib:
        attribs = {}
        for k,v in element.attrib.items():
            attribs[k] = v
        if (dictionary[tag] == {}):
            dictionary[tag] = attribs
        else:
            dictionary[tag] = [dictionary[tag], attribs]
    return dictionary

namespace is the xmlns string, including braces, that ElementTree prepends to all tags, so here I've cleared it as there is one namespace for the entire document

NB that I adjusted the raw xml too, so that 'empty' tags would produce at most a ' ' text property in the ElementTree representation

spacepattern = re.compile(r'\s+')
mydictionary = xml_to_dictionary(ElementTree.XML(spacepattern.sub(' ', content)))

would give for instance

{'note': {'to': 'Tove',
         'from': 'Jani',
         'heading': 'Reminder',
         'body': "Don't forget me this weekend!"}}

it's designed for specific xml that is basically equivalent to json, should handle element attributes such as

<elementName attributeName='attributeContent'>elementContent</elementName>

too

there's the possibility of merging the attribute dictionary / subtag dictionary similarly to how repeat subtags are merged, although nesting the lists seems kind of appropriate :-)

score 1 · Answer 5 · edited May 23 '17 at 12:01

Adding this line

d.update(('@' + k, v) for k, v in el.attrib.iteritems())

in the user247686's code you can have node attributes too.

Found it in this post https://stackoverflow.com/a/7684581/1395962

Example:

import xml.etree.ElementTree as etree
from urllib import urlopen

xml_file = "http://your_xml_url"
tree = etree.parse(urlopen(xml_file))
root = tree.getroot()

def xml_to_dict(el):
    d={}
    if el.text:
        d[el.tag] = el.text
    else:
        d[el.tag] = {}
    children = el.getchildren()
    if children:
        d[el.tag] = map(xml_to_dict, children)

    d.update(('@' + k, v) for k, v in el.attrib.iteritems())

    return d

Call as

xml_to_dict(root)

score 0 · Answer 6 · answered Sep 24 '08 at 15:05

0

Have you tried this?

print xml.etree.ElementTree.tostring( conf_new )

answered Sep 24 '08 at 15:05

S.Lott

384,516
81
508
779

Loooo · Answer 7 · 2010-03-30T13:03:28.953

0

most direct way to me :

root        = ET.parse(xh)
data        = root.getroot()
xdic        = {}
if data > None:
    for part in data.getchildren():
        xdic[part.tag] = part.text

edited Mar 30 '10 at 13:03

answered Mar 30 '10 at 12:57

Loooo

1
1

score 0 · Answer 8 · answered Mar 22 '12 at 01:31

XML has a rich infoset, and it takes some special tricks to represent that in a Python dictionary. Elements are ordered, attributes are distinguished from element bodies, etc.

One project to handle round-trips between XML and Python dictionaries, with some configuration options to handle the tradeoffs in different ways is XML Support in Pickling Tools. Version 1.3 and newer is required. It isn't pure Python (and in fact is designed to make C++ / Python interaction easier), but it might be appropriate for various use cases.

Editing XML as a dictionary in python?

8 Answers8

Linked