Couple of words:
You'll see that parsing the xml string (from file) and then writing it to another file, would not yield the same result, because the parser alters it. You can test it by simply running the code that you posted (obviously wo the 3rd line):
import xml.etree.ElementTree as etree
tree = etree.parse('input.xml')
tree.write('output.xml')
All the SOAP-ENV:* nodes have been converted to ns0*, and m* nodes to ns1*. For that I had to copy them from the xml file into the code (soap_env_ns_name
and m_ns_name
variables), as explained here: Saving XML using ETree in Python. It's not retaining namespaces, and adding ns0, ns1 and removing xmlns tags.
SOAP-ENC, and the defaults (xsi and xsd) namespaces, have been removed since they are not referenced in the xml. Also, m has been moved from the request node to the Envelope (root) node; I'm not sure if it's part of standard but on most XMLs I've seen the namespaces are declared in the root node. Anyway, here there's nothing you can do Python's parser is not very smart.
- Bottom line is that you won't get the exact same output (well unless you want to write your own parser as described here: Python: Update XML-file using ElementTree while conserving layout as much as possible).
So, there it goes, the code is very tight to the XML structure (ugly but not the ugliest), if the structure changes the code needs to be updated as well (and here I'm not talking about the namespaces workarounds):
@EDIT1: added the for
loop to register the namespaces, the previous version was acting as i described in the 2nd bullet. However when running it, it did replace the X s by Y s.
@EDIT2: commented out the domain
attribute value test, so now the value will be changed anyway.
import xml.etree.ElementTree as ET
env_node_name = "Envelope"
body_node_name = "Body"
request_node_name = "request"
domain_attr_name = "domain"
domain_attr_val = "XXXXX"
domain_attr_new_val = "YYYYY"
#Gainarie: those are the namespaces from the xml file
soap_env_ns_name = "SOAP-ENV"
m_ns_name = "m"
#soap_enc_ns_name = "SOAP-ENC"
#xsi_ns_name = "xsi"
#xsd_ns_name = "xsd"
namespaces_dict = {
soap_env_ns_name: "http://schemas.xmlsoap.org/soap/envelope/",
m_ns_name: "http://www.datapower.com/schemas/management",
# Those are simply ignored by the parser as they're not referenced in our xml.
#"SOAP-ENC": "http://schemas.xmlsoap.org/soap/encoding/",
#"xsi": "http://www.w3.org/2001/XMLSchema-instance",
#"xsd": "http://www.w3.org/2001/XMLSchema",
}
def tag(ns, name):
return "{" + ns + "}" + name
for key in namespaces_dict.keys():
ET.register_namespace(key, namespaces_dict[key])
tree = ET.parse("input.xml")
root = tree.getroot()
env_gen = root.iter(tag(namespaces_dict[soap_env_ns_name], env_node_name))
try:
for env in env_gen:
body_gen = env.iter(tag(namespaces_dict[soap_env_ns_name], body_node_name))
try:
for body in body_gen:
request_gen = body.iter(tag(namespaces_dict[m_ns_name], request_node_name))
try:
for request in request_gen:
if domain_attr_name in request.keys():
# Now, I didn't fully understand the question:
# you want to change the value of the 'domain' attribute (in your xml example: "XXXXX") to - let's say - "YYYYY" (as my code does) on one of the 2 below cases:
# 1: change it only if current value is "XXXXX"
# 2: change it regardless of the current value
# if it's 1, then that's OK, but if it's 2, you'll have to comment the very below 'if domain_attr_val ...' line (prepend it by a # - just like the current one)
#if domain_attr_val == request.get(domain_attr_name):
request.set(domain_attr_name, domain_attr_new_val)
except StopIteration:
print "Done iterating on '%s' node" % request_node_name
except StopIteration:
print "Done iterating on '%s' node" % body_node_name
except StopIteration:
print "Done iterating on '%s' node" % env_node_name
tree.write("output.xml")