0

Edit: In fact, it appears that htmltreeparse don't parse well kml files. In that case, xmlTreeParse is what is needed.


I try to parse a huge kml file in R. My issue is when I want to use xpath to "navigate" through the nodes of the tree. Either way I grab the problem, I can't manage to do it, as the functions are made for xml and html files. My final goal is to get a list of string of all the node under the node placemark.

# parse kml file:

pc2 <- htmlTreeParse(file = "http://www.doogal.co.uk/kml/EC.kml")
pc3 <- htmlTreeParse(file = "http://www.doogal.co.uk/kml/EC.kml", useInternalNodes = T)

# doesn't work
pc2["//@Placemark"]

# doesn't work either
xpathApply(pc3, "//@Placemark")

Is there a way to do it or the kml file block all?

So far, the only way I found was to doing it manually with call to the node, but it is not best practice.

pc4 <- htmlTreeParse(file = "http://www.doogal.co.uk/kml/EC.kml")$doc$chidren$kml ....
+ for loop 

Edit: There is a strange effect, here: when I download the file, it is a kml file, beginning by a kml balise. when I use htmlTreeParse, it adds an html level:

<!DOCTYPE html PUBLIC "-//EN" "http://www.w3">
<?xml version="1.0" encoding="UTF-8"?>
<!-- comment here-->
<html>
     <body>
     <kml xmlns="http://www.opengis.net/kml/2.2">
         <document> 
my document here
</document></kml></body></html>

And the html parser react strangely to this. To correct this, I use xmltreeparse and it works fine in the end.

YCR
  • 3,794
  • 3
  • 25
  • 29
  • 1
    The XML file in that URL has *default namespace*, see if this help : http://stackoverflow.com/questions/24954792/xpath-and-namespace-specification-for-xml-documents-with-an-explicit-default-nam/24955051#24955051 (the XML also doesn't contain element `meta` AFAICS) – har07 Dec 01 '15 at 13:52
  • Structurally though different markup, KML files are compliant XML files, so you should be able to use R's XML functions such as `xmlTreeParse()` and `xpathSApply()`. – Parfait Dec 01 '15 at 13:56
  • Thanks, I have modified the xpath. Ok, so the problem come probably from the files. Still the link help to explain but don't provide a solution so far. I keep looking. – YCR Dec 01 '15 at 14:23

0 Answers0