12

What is the best way to handle the lack of a namespace on some of the nodes in an xml document using lxml? Should I first modify all None named nodes to add the "gmd" name and then change the tree attributes to name http://www.isotc211.org/2005/gmd as "gmd"? If so, is there a clean way to do this with lxml or something else that would be relatively clean/safe?

from lxml import etree
nsmap = charts_tree.nsmap
nsmap.pop(None) # complains without this on the xpath with
# TypeError: empty namespace prefix is not supported in XPath
len (charts_tree.xpath('//*/gml:Polygon',namespaces=nsmap))
# 1180
len (charts_tree.xpath('//*/DS_DataSet',namespaces=nsmap))
# 0 ... Bummer!
len (charts_tree.xpath('//*/DS_DataSet'))
# 0 ... Also a bummer

e.g. http://www.charts.noaa.gov/ENCs/ENCProdCat_19115.xml

<DS_Series xmlns="http://www.isotc211.org/2005/gmd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gsr="http://www.isotc211.org/2005/gsr" xmlns:gss="http://www.isotc211.org/2005/gss" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/iso/19139/20070417/gmd/gmd.xsd">
<composedOf>
    <DS_DataSet>
        <has>
            <MD_Metadata>
                <parentIdentifier>
                    <gco:CharacterString>NOAA ENC Product Catalog</gco:CharacterString>
                </parentIdentifier>
...
<EX_BoundingPolygon>
    <polygon>
        <gml:Polygon gml:id="US1AK90M_P1">
            <gml:exterior>
                <gml:LinearRing>
                    <gml:pos>67.61505 -178.99979</gml:pos>
                    <gml:pos>73.99999 -178.99979</gml:pos>
...
                    <gml:pos>64.99997 -178.99979</gml:pos>
                    <gml:pos>67.61505 -178.99979</gml:pos>
                </gml:LinearRing>
Kurt Schwehr
  • 2,638
  • 3
  • 24
  • 41

1 Answers1

19

I believe your DS_DataSet is by virtue of being within the DS_Series (implying a default namespace of "http://www.isotc211.org/2005/gmd") carrying a namespace.

Try and map that into your namespace dictionary (you can probably first test through a print to see if it's already in there, otherwise add it and refer to the namespace by your new key).

nsmap['some_ns'] = "http://www.isotc211.org/2005/gmd"
len (charts_tree.xpath('//*/some_ns:DS_DataSet',namespaces=nsmap))

Which becomes:

nsmap['gmd'] = nsmap[None]
nsmap.pop(None)
len(charts_tree.xpath('//*/gmd:DS_DataSet',namespaces=nsmap))
Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
Alan Hynes
  • 206
  • 2
  • 2
  • 1
    Stumbled on this write-up that touches on why the XPath doesn't work: http://goodmami.org/2015/11/04/python-xpath-and-default-namespaces.html – ghukill Jun 22 '18 at 17:43
  • @ghukill taken from the link for posterity >>> As far as I know, there is not a good solution to this problem. Even lxml is aware of the problem but suggests modifying the document with a bogus prefix. The design of ElementPath seems to selectively choose which parts of the XPath spec to implement (when considered in conjunction with the model that ElementTree provides). In its defense, the XPath spec itself is lacking in some ways. – Edward Feb 04 '21 at 13:07