2

I have an xml-document that looks like this:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns="http://someurl/Oldschema"
     xsi:schemaLocation="http://someurl/Oldschema Oldschema.xsd"
     xmlns:framework="http://someurl/Oldframework">
   <framework:tag1> ... </framework:tag1>
   <framework:tag2> <tagA> ... </tagA> </framwork:tag2>
</root>

All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged. With some insights from this thread lxml: add namespace to input file, I tried the following:

def fix_nsmap(nsmap, tag):
    """update the old nsmap-dict with the new schema-urls. Example:
    fix_nsmap({"framework": "http://someurl/Oldframework",
               None: "http://someurl/Oldschema"}) ==
      {"framework": "http://someurl/Newframework",
       None: "http://someurl/Newschema"}"""
    ...

from lxml import etree
root = etree.parse(XMLFILE).getroot()
root_tag = root.tag.split("}")[1]
nsmap = fix_nsmap(root.nsmap)
new_root = etree.Element(root_tag, nsmap=nsmap)
new_root[:] = root[:]
# ... fix xsi:schemaLocation
return etree.tostring(new_root, pretty_print=True, encoding="UTF-8",
    xml_declaration=True) 

This produces the right 'attributes' in the root-tag but completely fails for the rest of the document:

<network xmlns:framework="http://someurl/Newframework"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://someurl/Newschema"
    xsi:schemaLocation="http://someurl/Newschema Schema.xsd">
<ns0:tag1 xmlns:ns0="http://someurl/Oldframework"> ... </ns0:information>
<ns1:tag2 xmlns:ns1="http://someurl/Oldframework"
          xmlns:ns2="http://someurl/Oldschema">
    <ns2:tagA> ... </ns2:tagA>
</ns1:tag2>

What is wrong with my approach? Is there any other way to change the namespaces? Maybe I could use xslt?

Thanks!

Denis

Community
  • 1
  • 1
dassmann
  • 327
  • 1
  • 6
  • 13
  • My answer is not very popular, so as an alternative I would like to add this: yes, I think you could use XSLT. See https://stackoverflow.com/a/31870245/407651 and https://stackoverflow.com/a/51660868/407651. – mzjn Sep 18 '18 at 15:32

1 Answers1

1

All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged.

I'd do a simple textual search-and-replace operation. It's much easier than fiddling with XML nodes. Like this:

with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
    data = infile.read()
    data = data.replace("http://someurl/Oldschema", "http://someurl/Newschema")
    data = data.replace("http://someurl/Oldframework", "http://someurl/Newframework")
    outfile.write(data)

The other question that you were inspired by is about adding a new namespace (and keeping the old ones). But you are trying to modify existing namespace declarations. Creating a new root element and copying the child nodes does not work in this case.

This line:

new_root[:] = root[:]

turns the children of the original root element into children of the new root element. But these child nodes are still associated with the old namespaces. So they have to be modified/recreated too. I guess it might be possible to come up with a reasonable way to do that, but I don't think you need it. Textual search-and-replace is good enough, IMHO.

Community
  • 1
  • 1
mzjn
  • 48,958
  • 13
  • 128
  • 248
  • This is how I 'fix' these files right now (using sed) but I was hopen there would be a more elegant python solution. Thanks anyway! – dassmann Jan 07 '14 at 22:47
  • As long as the namespace mapping cannot be changed (see https://bugs.launchpad.net/lxml/+bug/555602), I think it's hard to come up with something more compact or elegant than good old search-and-replace. – mzjn Jan 08 '14 at 09:47
  • 1
    It would be nice if downvoters left a comment to explain what is wrong with this answer. – mzjn Feb 04 '18 at 10:29
  • I think the issue with the answer is that it is hardcoded. No one appreciates hardcoded stuff. This is my opinion on why your answer was downvoted. I am working on this issue as well and will let you know if i find something generic. – Rahul Sep 18 '18 at 14:33
  • @Rahul: My answer may not be very elegant but it gets the job done. The only other idea that I can think of is using XSLT (see https://stackoverflow.com/a/51660868/407651 and https://stackoverflow.com/a/31870245/407651). But I'm not sure how that would be any less "hardcoded". – mzjn Sep 18 '18 at 15:43
  • @mzjn Since, you were wondering what was "wrong" with you answer, i simply put formward my opinion on it and yes it does get the job done! – Rahul Sep 18 '18 at 16:27
  • @Rahul: Thanks. If you come up with something better, I'm interested! – mzjn Sep 18 '18 at 16:32
  • Sure! i am not close to being average in coding.. so dont keep your hopes up :) – Rahul Sep 18 '18 at 16:33