0

I have an xml document and I want to extract a subnode (boundedBy) and pretty_print it exactly as it looks in the original document (with exception to the pretty formatting).

<?xml version="1.0" encoding="UTF-8" ?>
<wfs:FeatureCollection
   xmlns:sei="https://somedomain.com/namespace"
   xmlns:wfs="http://www.opengis.net/wfs"
   xmlns:gml="http://www.opengis.net/gml"
   xmlns:ogc="http://www.opengis.net/ogc"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.opengis.net/wfs http://schemas.opengis.net/wfs/1.1.0/wfs.xsd 
                       https://somedomain.com/schemas/wfsnamespace some.xsd">
      <gml:boundedBy>
        <gml:Box srsName="EPSG:4326">
            <gml:coordinates>-10.934396,-139.997120 77.396455,-53.627763</gml:coordinates>
        </gml:Box>
      </gml:boundedBy>
    <gml:featureMember>
      <sei:HUB_HEIGHT_FCST>
        <!--- This is the section I want --->
        <gml:boundedBy>
            <gml:Box srsName="EPSG:4326">
                <gml:coordinates>14.574435,-139.997120 14.574435,-139.997120</gml:coordinates>
            </gml:Box>
        </gml:boundedBy>
        <!--- This is the section I want --->
        <sei:geometry_4326>
        <gml:Point srsName="EPSG:4326">
          <gml:coordinates>14.574435,-139.997120</gml:coordinates>
        </gml:Point>
        </sei:geometry_4326>
        <sei:rundatetime>2017-09-26 00:00:00</sei:rundatetime>
        <sei:validdatetime>2017-09-26 17:00:00</sei:validdatetime>
      </sei:HUB_HEIGHT_FCST>
    </gml:featureMember>
</wfs:FeatureCollection>

Here is how I'm extracting the subnode:

# parse the xml string
parser = etree.XMLParser(remove_blank_text=True, remove_comments=True, recover=False, strip_cdata=False)
root = etree.fromstring(xmlstr, parser=parser)
#find the subnode I want 
subnodes = root.xpath("./gml:boundedBy", namespaces={'gml': 'http://www.opengis.net/gml'})
subnode = subnodes[0]
# make a pretty output
xmlstr = etree.tostring(subnode, xml_declaration=False, encoding="UTF-8", pretty_print=True)
print xmlstr

Which gives me this. Unfortunately lxml is adding the namespaces to the boundedBy node (which makes sense for the sake of completeness in xml).

<gml:boundedBy xmlns:gml="http://www.opengis.net/gml" xmlns:sei="https://somedomain.com/namespace" xmlns:wfs="http://www.opengis.net/wfs" xmlns:ogc="http://www.opengis.net/ogc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <gml:Box srsName="EPSG:4326">
    <gml:coordinates>-10.934396,-139.997120 77.396455,-53.627763</gml:coordinates>
  </gml:Box>
</gml:boundedBy>

I only want the subnode as it looked in the original document.

<gml:boundedBy>
    <gml:Box srsName="EPSG:4326">
        <gml:coordinates>14.574435,-139.997120 14.574435,-139.997120</gml:coordinates>
    </gml:Box>
</gml:boundedBy>

I'm flexible with not using lxml, but either way I haven't found options on how to accomplish this.


edit: Since it was pointed out that I should explain why I want to do this...

I'm trying to log the xml fragment without altering it's original structure. The automated test I'm building looks at certain nodes for correctness. In the process I'm logging the fragment and want to make it a bit more readable for the person reviewing. Some of the fragments can get fairly large which is why pretty_print is so nice.

Marcel Wilson
  • 3,842
  • 1
  • 26
  • 55
  • 2
    You're asking for the library to help you create "XML" that's *not* [**namespace-well-formed**](https://stackoverflow.com/a/25830482/290085). It's not going to help you do that, and you shouldn't be trying to do that. – kjhughes Sep 26 '17 at 22:26
  • ...but if you were only really wishing that the *unused* namespace declarations not be included, then your request would be more reasonable. Their being there is not wrong -- just unnecessary and arguably unsightly. – kjhughes Sep 26 '17 at 23:23
  • I'm well aware that lxml is adding them is not wrong. That isn't the question I'm asking. I want to print a fragment of the original document. The whole purpose of this isn't about valid xml it's about printing portions of the xml. – Marcel Wilson Sep 26 '17 at 23:55
  • 1
    Then write your own serializer. Maybe you can hack the one in lxml to suit your needs. Just realize that your needs are nonstandard and lead to non-interoperable markup. Really, without offering a sound explanation for your unusual needs, readers are left with the impression that you do not understand namespaces and, frankly, by continuing to press the point, do not know what you are doing. – kjhughes Sep 27 '17 at 00:00

0 Answers0