XPath is kicking my trash. I'm not sure why it's so hard for me to get it working. I paste below what I'm trying, and some quotes from the documentation.
The sanitized file:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Document>
<Style id="PolyStyle70412"></Style>
<Folder>
<Placemark id="default_starting_location"></Placemark>
<Folder>
<name>LandParcels</name>
<visibility>0</visibility>
<Folder>
<name>LandParcels batch 1-2</name>
<visibility>0</visibility>
<Document id="LandParcels" xsi:schemaLocation="http://www.opengis.net/kml/2.2 http://schemas.opengis.net/kml/2.2.0/ogckml22.xsd http://www.google.com/kml/ext/2.2 http://code.google.com/apis/kml/schema/kml22gx.xsd">
<name>LandParcels</name>
<visibility>0</visibility>
<Snippet maxLines="0"></Snippet>
<description>text</description>
<Style id="PolyStyle70"></Style>
<Folder id="FeatureLayer7">
<Placemark id="ID_70000">
<name>12345678</name>
<MultiGeometry>
<coordinates>100,200</coordinates>
</MultiGeometry>
</Placemark>
</Folder>
</Document>
</Folder>
</Folder>
</Folder>
</Document>
</kml>
The problem and expected result:
>>> from lxml import etree
>>> doc = etree.parse('/path/to/file.xml')
>>> print doc.xpath('Placemark') # Returns an empty list
According to the xpath syntax, the above should select all nodes with the tag 'Placemark'.
>>> print doc.xpath('//Placemark') # Returns an empty list
According to the same source, that should select all nodes no matter where they are in the document.
>>> print doc.xpath('/kml') # Returns an empty list
Again, this should select the root node... Nothing works!
Well, this works:
>>> print doc.xpath('/*') # Returns the kml node
>>> print doc.xpath('/*/*') # Returns the Document node
OK, so I know that final one is not how you are supposed to get the Document node, but since we have it with that, I try to start there and drill down to get the coordinates:
>>> print doc.xpath('/*/*/Folder/Folder/Folder/Document/Folder/Placemark/MultiGeometry/coordinates') # Returns an empty list
I've tried a lot of other things too. Why does nothing but the slash-star syntax seem to work?? Yes, I've tried lots of StackOverflow searches, and people give toy xml files and some code, and I paste it in and it works. But why can't I get any good results with my file?
Ultimately I'm trying to extract these nodes:
/kml/Document/Folder/Folder/Folder/Document/description
/kml/Document/Folder/Folder/Folder/Document/Folder/Placemark/MultiGeometry/coordinates