1

I have this xml:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <SOAP-ENV:Body>
        <m:request xmlns:m="http://www.datapower.com/schemas/management" domain="XXXXX">
            <m:do-action>
                <FlushDocumentCache>
                    <XMLManager class="XMLManager">default</XMLManager>
                </FlushDocumentCache>
                <FlushStylesheetCache>
                    <XMLManager class="XMLManager">default</XMLManager>
                </FlushStylesheetCache>
            </m:do-action>
        </m:request>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

I want to change only the value XXXXX of the domain property.

I did something like this :

import xml.etree.ElementTree as etree
tree = etree.parse('input.xml')
# HOW TO FIND THE VALUE XXXXX AND CHANGE IT WITH A NEW VALUE ???
tree.write('output.xml')

Thanks.

CristiFati
  • 38,250
  • 9
  • 50
  • 87
Bouchaib Mounir
  • 1,293
  • 2
  • 14
  • 16

1 Answers1

1

Couple of words:

  • You'll see that parsing the xml string (from file) and then writing it to another file, would not yield the same result, because the parser alters it. You can test it by simply running the code that you posted (obviously wo the 3rd line):

    import xml.etree.ElementTree as etree
    tree = etree.parse('input.xml')
    tree.write('output.xml')
    
  • All the SOAP-ENV:* nodes have been converted to ns0*, and m* nodes to ns1*. For that I had to copy them from the xml file into the code (soap_env_ns_name and m_ns_name variables), as explained here: Saving XML using ETree in Python. It's not retaining namespaces, and adding ns0, ns1 and removing xmlns tags.

  • SOAP-ENC, and the defaults (xsi and xsd) namespaces, have been removed since they are not referenced in the xml. Also, m has been moved from the request node to the Envelope (root) node; I'm not sure if it's part of standard but on most XMLs I've seen the namespaces are declared in the root node. Anyway, here there's nothing you can do Python's parser is not very smart.

  • Bottom line is that you won't get the exact same output (well unless you want to write your own parser as described here: Python: Update XML-file using ElementTree while conserving layout as much as possible).

So, there it goes, the code is very tight to the XML structure (ugly but not the ugliest), if the structure changes the code needs to be updated as well (and here I'm not talking about the namespaces workarounds):

@EDIT1: added the for loop to register the namespaces, the previous version was acting as i described in the 2nd bullet. However when running it, it did replace the X s by Y s.

@EDIT2: commented out the domain attribute value test, so now the value will be changed anyway.

import xml.etree.ElementTree as ET

env_node_name = "Envelope"
body_node_name = "Body"
request_node_name = "request"
domain_attr_name = "domain"
domain_attr_val = "XXXXX"
domain_attr_new_val = "YYYYY"

#Gainarie: those are the namespaces from the xml file
soap_env_ns_name = "SOAP-ENV"
m_ns_name = "m"
#soap_enc_ns_name = "SOAP-ENC"
#xsi_ns_name = "xsi"
#xsd_ns_name = "xsd"

namespaces_dict = {
    soap_env_ns_name: "http://schemas.xmlsoap.org/soap/envelope/",
    m_ns_name: "http://www.datapower.com/schemas/management",

    # Those are simply ignored by the parser as they're not referenced in our xml.
    #"SOAP-ENC": "http://schemas.xmlsoap.org/soap/encoding/",
    #"xsi": "http://www.w3.org/2001/XMLSchema-instance",
    #"xsd": "http://www.w3.org/2001/XMLSchema",
}


def tag(ns, name):
    return "{" + ns + "}" + name


for key in namespaces_dict.keys():
    ET.register_namespace(key, namespaces_dict[key])

tree = ET.parse("input.xml")
root = tree.getroot()
env_gen = root.iter(tag(namespaces_dict[soap_env_ns_name], env_node_name))
try:
    for env in env_gen:
        body_gen = env.iter(tag(namespaces_dict[soap_env_ns_name], body_node_name))
        try:
            for body in body_gen:
                request_gen = body.iter(tag(namespaces_dict[m_ns_name], request_node_name))
                try:
                    for request in request_gen:
                        if domain_attr_name in request.keys():
                            # Now, I didn't fully understand the question:
                            # you want to change the value of the 'domain' attribute (in your xml example: "XXXXX") to - let's say - "YYYYY"  (as my code does) on one of the 2 below cases:
                            # 1: change it only if current value is "XXXXX"
                            # 2: change it regardless of the current value
                            # if it's 1, then that's OK, but if it's 2, you'll have to comment the very below 'if domain_attr_val ...' line (prepend it by a # - just like the current one)
                            #if domain_attr_val == request.get(domain_attr_name):
                            request.set(domain_attr_name, domain_attr_new_val)
                except StopIteration:
                    print "Done iterating on '%s' node" % request_node_name
        except StopIteration:
            print "Done iterating on '%s' node" % body_node_name
except StopIteration:
    print "Done iterating on '%s' node" % env_node_name

tree.write("output.xml")
Community
  • 1
  • 1
CristiFati
  • 38,250
  • 9
  • 50
  • 87
  • Thank you CristiFati for this explanation, Traceback (most recent call last): File "/Users/majid/PycharmProjects/changeXmlAttribute/changeSOAPattr.py", line 32, in tree = ET.parse("soap.xml") File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/xml/etree/ElementTree.py", line 1242, in parse tree.parse(source, parser) File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/xml/etree/ElementTree.py", line 1730, in parse self._root = parser._parse(source) xml.etree.ElementTree.ParseError: no element found: line 13, column 16 – Bouchaib Mounir Oct 23 '15 at 05:43
  • Well, the error is when the engine tries to parse the xml, meaning that the xml is incorrect. Check (maybe using a web browser) for _input.xml_ file for errors. – CristiFati Oct 23 '15 at 08:37
  • the xml was missing : , but your code did not change "XXXXX" to "YYYYY", the xml file still the same. – Bouchaib Mounir Oct 23 '15 at 10:19
  • Did you have the chance to test this new version? It is working for me. If it isn't for you can you paste the contents of _output.xml_ ? – CristiFati Oct 25 '15 at 09:48
  • Thanks ChristFati, it's working now, but I don't want it to depend on the value XXXXX, just go and change whatever bin domain to a new value. – Bouchaib Mounir Oct 26 '15 at 14:11
  • Ok, then simply comment the line `if domain_attr_val == request.get(domain_attr_name):` (as pecified in the comments in my code), that way it will change the domain value to the value of `domain_attr_new_val` (currently _YYYYY_ ) regardless of its current value. Optionally you can unindent the next line `request.set(domain_attr_name, domain_attr_new_val)` by 4 spaces, but that's only a matter of code formatting; it will still work even if you didn't. – CristiFati Oct 26 '15 at 14:21