2

I've been working on loading KML files into R to make web maps with Leaflet/Shiny. The import is pretty simple (using this sample KML):

library(rgdal)

sampleKml <- readOGR("D:/KML_Samples.kml", layer = ogrListLayers("D:/KML_Samples.kml")[1])

In this example, ogrListLayers pulls in all of the kml layers, and I subset only the first element/layer. Easy peasy.

The problem is that using this method to read KML layers only pulls in two fields: "Name" and "Description," as seen below:

> sampleKml <- readOGR("D:/KML_Samples.kml", layer = ogrListLayers("D:/KML_Samples.kml")[1])
OGR data source with driver: KML 
Source: "D:/KML_Samples.kml", layer: "Placemarks"
with 3 features
It has 2 fields
> sampleKml@data
                Name                                                                                  Description
1   Simple placemark Attached to the ground. Intelligently places itself at the height of the underlying terrain.
2 Floating placemark                                                  Floats a defined distance above the ground.
3 Extruded placemark                                              Tethered to the ground by a customizable "tail" 

So R reads the KML layer as a SpatialPointsDataFrame with 3 features (3 different points) and two fields (the columns). However, when I pull the layer into QGIS and read its attribute table, there are many fields in addition to Name and Description, seen here.

From what I can tell, 'name' and 'description' are KML Placemarks, and any additional data are considered ExtendedData. I want to pull import this extended data along with the placemark data.

Is there a way to pull ALL of these KML layer fields/attributes into R? Preferably with readOGR(), but I'm open to all suggestions.

Lauren
  • 1,035
  • 1
  • 14
  • 32
  • While I nothing of GIS or KML, try playing with the args in [readOGR](https://www.rdocumentation.org/packages/rgdal/versions/1.2-8/topics/readOGR) even with *verbose* to see if any pertinent messages appear. – Parfait Aug 31 '17 at 21:17
  • Also, I tested your link in [Google's KML validator](http://googlemapsapi.blogspot.com/2007/06/validate-your-kml-online-or-offline.html) and it passes, however with a recommendation for compatibility in widest range of feed readers: *Placemark should contain a id attribute. This is important if you want to link directly to features*. Here is [link report](http://www.feedvalidator.org/check.cgi?url=https%3A//developers.google.com/kml/documentation/KML_Samples.kml) – Parfait Aug 31 '17 at 21:18
  • What do you want to do with the data once you get your KML into R? I ask b/c some of the fields shown in your QGIS screenshot are boiler-plate fields that are not even written into the KML file (eg: timestamp, begin, end), and there are many other fields in the KML that are not seen by QGIS (eg: LookAt and all its children). You probably don't want/need all of them? Is this sample file (from the KML developers site) the actual KML you want to parse, or do you have another file which has name/value pairs in the ExtendedData tags (or shown as a table in the balloons)? – Christiaan Adams Sep 05 '17 at 15:58
  • @ChristiaanAdams I want to pull in ExtendedData! I will edit my post to clarify (I discovered the term 'extended data' only after I had posted this question). My actual KML is for tropical cyclones, and I want to pull in fields such as date/time, wind speed, etc. – Lauren Sep 05 '17 at 19:28
  • Great, that probably makes it easier. If the data you want is actually in the KML as ExtendedData name/value pairs, you should be able to parse it out pretty easily. Sorry I can't help with R code for that. Just be careful because there are a LOT of KML files out there which were generated by the KML export tools in ESRI's ArcGIS, where the table of data in the balloons is just an HTML blob, and is not stored in ExtendedData name/value pairs, so it's a lot harder to parse. Since your sample file does not contain ExtendedData, I suggest providing one that does, so the R experts can help. – Christiaan Adams Sep 06 '17 at 07:12
  • While this issue hasn't yet been solved, it appears to be because of a compatibility issue between the libkml library and Windows. – Lauren Nov 29 '17 at 15:12
  • Also, this is very on topic for gis.stackexchange.com, similar questions there please? – Spacedman Aug 04 '18 at 13:00

1 Answers1

4

TL;DR

The underlying problem is the missing library LibKML for windows. My solution is extracting the data directly from the KML via a function.

Problem

I ran into the same problem and after some googling it appears that this has something to do with LibKML and Windows. Executing the same code on my Ubuntu machine yielded different results, namely the ExtendedData was retrieved when loading the saved KML file.

library(rgdal)
library(dplyr)
poly_df<-data.frame(x=c(1,1,0,0),y=c(1,0,0,1))
poly<-poly_df %>% 
  Polygon %>% 
  list %>% 
  Polygons(ID="1") %>% 
  list %>% 
  SpatialPolygons(proj4string = CRS("+init=epsg:4326")) %>% 
  SpatialPolygonsDataFrame(data=data.frame(test="this is a test"))

writeOGR(poly,"test.kml",driver="KML",layer="poly")
poly2<-readOGR("test.kml")
poly2@data

If one would manage to build LibKML [1], s/he would be able to load KML files with the ExtendedData [2].

On Windows the LibKML needs to be build with Visual Studio 2005 [1]. This Visual Studio version is not supported anymore [3]. In [3] user2889419 supplies the link to the 2005 version.
I downloaded and installed the version but building LibKML eventually failed with a lot of errors and warnings (certain files do not exist). This is were I stopped because I am way out of my comfort zone but wanted to share the results of my chase.

Solution in R

My solution is to read the KML directly and then extract the ExtendedData while loading the Spatial Object via rgdal's readOGR. My assumption is that readOGR starts on top of the file as does my extraction routine. Both are then merged and the output is a SpatialPolygonsDataFrame.
I had some troubles extracting the nodes from the KML files at first because I was not aware of the concept of namespaces [4]. (Edited the following function because I ran into troubles with KML files of other origins.)

readKML <- function(file,keep_name_description=FALSE,layer,...) {
  # Set keep_name_description = TRUE to keep "Name" and "Description" columns
  #   in the resulting SpatialPolygonsDataFrame. Only works when there is
  #   ExtendedData in the kml file.

  sp_obj<-readOGR(file,layer,...)
  xml1<-read_xml(file)
  if (!missing(layer)) {
    different_layers <- xml_find_all(xml1, ".//d1:Folder") 
    layer_names <- different_layers %>% 
      xml_find_first(".//d1:name") %>% 
      xml_contents() %>% 
      xml_text()

    selected_layer <- layer_names==layer
    if (!any(selected_layer)) stop("Layer does not exist.")
    xml2 <- different_layers[selected_layer]
  } else {
    xml2 <- xml1
  }

  # extract name and type of variables

  variable_names1 <- 
    xml_find_first(xml2, ".//d1:ExtendedData") %>% 
    xml_children() 

  while(variable_names1 %>% 
        xml_attr("name") %>% 
        is.na() %>% 
        any()&variable_names1 %>%
        xml_children() %>% 
        length>0) variable_names1 <- variable_names1 %>%
    xml_children()

  variable_names <- variable_names1 %>%
    xml_attr("name") %>% 
    unique()

  # return sp_obj if no ExtendedData is present
  if (is.null(variable_names)) return(sp_obj)

  data1 <- xml_find_all(xml2, ".//d1:ExtendedData") %>% 
    xml_children()

  while(data1 %>%
        xml_children() %>% 
        length>0) data1 <- data1 %>%
    xml_children()

  data <- data1 %>% 
    xml_text() %>% 
    matrix(.,ncol=length(variable_names),byrow = TRUE) %>% 
    as.data.frame()

  colnames(data) <- variable_names

  if (keep_name_description) {
    sp_obj@data <- data
  } else {
    try(sp_obj@data <- cbind(sp_obj@data,data),silent=TRUE)
  }
  sp_obj
}

Old: extracting via ReadLines

My solution is to read the KML directly and then extract the ExtendedData while loading the Spatial Object via rgdal's readOGR. My assumption is that readOGR starts on top of the file as does my extraction routine. Both are then merged and the output is a SpatialPolygonsDataFrame.

library(tidyverse)
library(rgdal)

readKML<-function(file,keep_name_description=FALSE,...) {
  # Set keep_name_description = TRUE to keep "Name" and "Description" columns 
  #   in the resulting SpatialPolygonsDataFrame. Only works when there is 
  #   ExtendedData in the kml file.

  if (!grepl("\\.kml$",file)) stop("File is not a KML file.")
  if (!file.exists(file)) stop("File does not exist.")
  map<-readOGR(file,...)

  f1<-readLines(file)

  # get positions of ExtendedData in document
  exdata_position<-grep("ExtendedData",f1) %>% 
    matrix(ncol=2,byrow = TRUE) %>% 
    apply(1,function(x) {
      pos<-x[1]:x[2]
      pos[2:(length(pos)-1)]
    }) %>% 
    t %>% 
    as.data.frame

  # if there is no ExtendedData return SpatialPolygonsDataFrame
  if (ncol(exdata_position)==0) return(map)

  # Get Name of different columns
  extract1<-f1[exdata_position[1,] %>% 
                 unlist]  
  names_of_data<-extract1 %>% 
    strsplit("name=\"") %>%
    lapply(function(x) strsplit(x[[2]],split="\"") ) %>%
    unlist(recursive = FALSE) %>%
    lapply(function(x) return(x[1])) %>% 
    unlist

  # Extract Extended Data
  dat<-lapply(seq(nrow(exdata_position)),function(x) {
    extract2<-f1[exdata_position[x,] %>% 
                   unlist]  
    extract2 %>% 
      strsplit(">") %>%
      lapply(function(x) strsplit(x[[2]],split="<") ) %>% unlist(recursive = FALSE) %>%
      lapply(function(x) return(x[1])) %>% 
      unlist %>% 
      matrix(nrow=1) %>% 
      as.data.frame
  }) %>% 
    do.call(rbind,.)

  # Rename columns
  colnames(dat)<-names_of_data

  # Check if Name and Description should be dropped
  if (keep_name_description) {
    map@data<-cbind(map@data,dat)
  } else {
    map@data<-dat
  }
  map
}

[1] https://github.com/google/libkml/wiki/Building-and-installing-libkml
[2] https://github.com/r-spatial/sf/issues/499
[3] Where to download visual studio express 2005?
[4] Parsing XML in R: Incorrect namespaces

Sebastian
  • 101
  • 1
  • 5
  • The test file you linked to in the question doesn't have the word "ExtendedData" in it. Also you should read KML using a package for reading XML files. – Spacedman Aug 03 '18 at 10:00
  • Thanks for your comment! You are right with the XML, I had not the time to dive into it yet. You are right that the test file has no ExtendedData but the question was "Is there a way to pull ALL of these KML layer fields/attributes into R?" which includes the ExtendedData. I should clarify that I provide the code to make the test.kml which has the ExtendedData in it and then demonstrate a maybe less preferable but working way to extract it. – Sebastian Aug 04 '18 at 10:21
  • I think this is still an issue that kml driver is not reading and making available the relevant info. I did not have any luck with similar code from github using sf. https://github.com/r-spatial/sf/issues/499#issuecomment-411059252 Using online web converters from kml to shp or geojson, then reading in by st_read() did make all relevant fields available, but not very scalable! I have access to RStudio on a server running linux, so I'm reading my kml files there for the moment. – Mark Neal Oct 22 '21 at 05:47