Strip attributes / namespaces from SOAP XML

Question

If I have several tags like this: <ServiceId xsi:type="xsd:string">aval</ServiceId>

Is xsi:type="xsd:string" technically an attribute?

When I try this:

from StringIO import StringIO
from SOAPpy.wstools.Utility import DOM
badxml = '''<?xml version="1.0" encoding="utf-8"?>
         <ServiceId xsi:type="xsd:string">aval</ServiceId>'''
document = DOM.loadDocument(StringIO(badxml))
orig_len = len(document.childNodes[0].toxml())
for node in document.childNodes:
    node.removeAttribute('xsi:type')
new_len = len(node.toxml())
diff = orig_len - new_len
print diff

...I get an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/site-packages/SOAPpy/wstools/Utility.py", line 572, in loadDocument
    return xml.dom.minidom.parse(data)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/minidom.py", line 1915, in parse
    return expatbuilder.parse(file)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 930, in parse
    result = builder.parseFile(file)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: unbound prefix: line 2, column 9

I basically want to remove all attributes from large XML documents.

score 0 · Answer 1 · answered Aug 25 '17 at 22:43

XSI is a namespace. You can use them in your queries if you need them, removing them can have detrimental effects on your data outcomes or if there are other xml elements with the same element name (but different namespace).

have a look here: Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"

otherwise what you are doing is a bit of a hack and you might as well read the file as a string and do a mass regex replace on the namespace string you want to delete (not recommended).

Strip attributes / namespaces from SOAP XML

1 Answers1