65

I'm trying to develop simple Python (3.2) code to read XML files, do some corrections and store them back. However, during the storage step ElementTree adds this namespace nomenclature. For example:

<ns0:trk>
  <ns0:name>ACTIVE LOG</ns0:name>
<ns0:trkseg>
<ns0:trkpt lat="38.5" lon="-120.2">
  <ns0:ele>6.385864</ns0:ele>
  <ns0:time>2011-12-10T17:46:30Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="40.7" lon="-120.95">
  <ns0:ele>5.905273</ns0:ele>
  <ns0:time>2011-12-10T17:46:51Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="43.252" lon="-126.453">
  <ns0:ele>7.347168</ns0:ele>
  <ns0:time>2011-12-10T17:52:28Z</ns0:time>
</ns0:trkpt>
</ns0:trkseg>
</ns0:trk>

The code snippet is below:

def parse_gpx_data(gpxdata, tzname=None, npoints=None, filter_window=None,
                   output_file_name=None):
        ET = load_xml_library();

    def find_trksegs_or_route(etree, ns):
        trksegs=etree.findall('.//'+ns+'trkseg')
        if trksegs:
            return trksegs, "trkpt"
        else: # try to display route if track is missing
            rte=etree.findall('.//'+ns+'rte')
            return rte, "rtept"

    # try GPX10 namespace first
    try:
        element = ET.XML(gpxdata)
    except ET.ParseError as v:
        row, column = v.position
        print ("error on row %d, column %d:%d" % row, column, v)

    print ("%s" % ET.tostring(element))
    trksegs,pttag=find_trksegs_or_route(element, GPX10)
    NS=GPX10
    if not trksegs: # try GPX11 namespace otherwise
        trksegs,pttag=find_trksegs_or_route(element, GPX11)
        NS=GPX11
    if not trksegs: # try without any namespace
        trksegs,pttag=find_trksegs_or_route(element, "")
        NS=""

    # Store the results if requested
    if output_file_name:
        ET.register_namespace('', GPX11)
        ET.register_namespace('', GPX10)
        ET.ElementTree(element).write(output_file_name, xml_declaration=True)

    return;

I have tried using the register_namespace, but with no positive result. Are there any specific changes for this version of ElementTree 1.3?

Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
ilya1725
  • 4,496
  • 7
  • 43
  • 68
  • 1
    Tell me if I understood your question, you'd like to have `` instead of `` and so on? – Rik Poggi Jan 24 '12 at 08:40
  • 1
    Correct. I'd like to have instead of and so on. – ilya1725 Jan 24 '12 at 16:27
  • This is not a real solution but since it seems that you load a string, have you tried to remove the namespace with some regexp? After that if you load and save without everything should be ok. – Rik Poggi Jan 24 '12 at 18:01
  • 1
    Hi Rik. I'll do it everything else fails. I'd like to configure ElementTree not to print it in the first place. – ilya1725 Jan 24 '12 at 20:29

5 Answers5

96

In order to avoid the ns0 prefix the default namespace should be set before reading the XML data.

ET.register_namespace('', "http://www.topografix.com/GPX/1/1")
ET.register_namespace('', "http://www.topografix.com/GPX/1/0")
ilya1725
  • 4,496
  • 7
  • 43
  • 68
  • 2
    Looks like not before. I'm able to read XML file and get namespace and only after that set register_namespace. tree = ET.parse(str(udx_path)) root = tree.getroot() ns = { # extract namespace of root element 'udx': root.tag[1:root.tag.index('}')] } ET.register_namespace('', root.tag[1:root.tag.index('}')]) – likern May 03 '17 at 19:16
  • 1
    This is not complete way to preserve difference in parsed and output ElementTree string (if using ElementTree.tostring(root)). singingsingh is complete. – Emil Apr 18 '18 at 12:01
  • Just register before printing can be good enough. – Instein Apr 19 '22 at 23:59
  • Singingsingh's explanation is more appropriate to the question. – Aananth C N Aug 15 '22 at 17:57
50

You need to register all your namespaces before you parse xml file.

For example: If you have your input xml like this and Capabilities is the root of your Element tree.

<Capabilities xmlns="http://www.opengis.net/wmts/1.0"
    xmlns:ows="http://www.opengis.net/ows/1.1"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:gml="http://www.opengis.net/gml"
    xsi:schemaLocation="http://www.opengis.net/wmts/1.0 http://schemas.opengis.net/wmts/1.0/wmtsGetCapabilities_response.xsd"
    version="1.0.0">

Then you have to register all the namespaces i.e attributes present with xmlns like this:

ET.register_namespace('', "http://www.opengis.net/wmts/1.0")
ET.register_namespace('ows', "http://www.opengis.net/ows/1.1")
ET.register_namespace('xlink', "http://www.w3.org/1999/xlink")
ET.register_namespace('xsi', "http://www.w3.org/2001/XMLSchema-instance")
ET.register_namespace('gml', "http://www.opengis.net/gml")
singingsingh
  • 1,364
  • 14
  • 15
2

If you try to print the root, you will see something like this: http://www.host.domain/path/to/your/xml/namespace}RootTag' at 0x0000000000558DB8>

So, to avoid the ns0 prefix, you have to change the default namespace before parsing the XML data as below:

ET.register_namespace('', "http://www.host.domain/path/to/your/xml/namespace")
1

Or you could regex it away:

def remove_xml_namespace(xml_str: str) -> str:
    xml_str = re.sub(r"<([^:]+):(\w+).+(?=xmlns)[^>]+>([\s\S]*)</(\1):(\2)>", r"\3", xml_str)
    # replace namespace elements from end tag
    xml_str = re.sub(r"</[^:]*:", r"</", xml_str)
    # replace namespace from start tags
    xml_str = re.sub(r"<[^/][^:]*:([^/>]*)(/?)>", r"<\1\2>", xml_str)
    return xml_str
1

It seems that you have to declare your namespace, meaning that you need to change the first line of your xml from:

<ns0:trk>

to something like:

<ns0:trk xmlns:ns0="uri:">

Once did that you will no longer get ParseError: for unbound prefix: ..., and:

elem.tag = elem.tag[(len('{uri:}'):]

will remove the namespace.

Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
  • Hi Rik. The example XML I showed is the _output_. The input XML, which parses fine, doesn't have the 'ns0:' prefix. It is just standard GPX code. – ilya1725 Jan 24 '12 at 21:50
  • If the line `element = ET.XML(gpxdata)` gives you an element with `ns0` then the "problem" is in gpxdata, in which case you have to options: "fix" the gpxdata or find out why the standard parser does that and build a new one for [`ET.XML`](http://docs.python.org/py3k/library/xml.etree.elementtree.html#xml.etree.ElementTree.XML). – Rik Poggi Jan 24 '12 at 22:30
  • The original gpxdata doesn't have any `ns0` entries. However, your hint, Rik, kind of lead me to the solution. Basically, the `ET.register_namespace('', GPX11)` `ET.register_namespace('', GPX10)` should be done before reading, i.e. `ET.XML`. – ilya1725 Jan 25 '12 at 06:31