3

I have a data-frame with over 3000 records which include the latitude and longitude coordinates of each observation. I would like to get the country and state or province from each set of coordinates.

I appear to have a partial solution, but I am a neophyte with R and do not understand how to extract the information from the JSON output into a data-frame I can bind to the original dataset.

How to I parse the nested-list created by fromJSON into a data.frame? Specifically, I would like the new data-frame to look something like:

Latitude, Longitude, Country, State (column names)

Alternatively, a better solution to my problem of getting the spatial information is appreciated!

Here is my code:

library(RDSTK)
library(httr)
library(rjson)
Coords <- structure(list(Latitude = c(43.30528, 46.08333, 32.58333, 46.25833, 45.75, 46.25, 45.58333, 45.58333, 44.08333, 45.75), 
                         Lontitude = c(-79.80306, -82.41667, -117.08333, -123.975, -85.75, -123.91667, -86.75, -86.75, -76.58333, -85.25
                                         )), .Names = c("Latitude", "Longitude"), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,9L, 10L), class = "data.frame")

json_file <- fromJSON(coordinates2politics(Coords$Latitude, Coords$Longitude))
Keith W. Larson
  • 1,543
  • 2
  • 19
  • 34
  • 1
    Possible duplicate of [Parse JSON with R](http://stackoverflow.com/questions/2061897/parse-json-with-r) – alistaire Jul 06 '16 at 22:42
  • @alistaire I have read through the post and either I do not get it or it does not solve my problem! Sorry if it is the former, but as I said, I am new to working with JSON output and lists. – Keith W. Larson Jul 06 '16 at 22:45

2 Answers2

4

I prefer to use jsonlite for parsing JSON in R.

To parse the nested JSON list, you can do the fromJSON call inside an lapply.

jsonlite::fromJSON tries to simplify the result for you. But, as JSON is designed to handle nested data structures, you're often returned a data.frame of lists, so to get the data.frame you're after, you need to know which element of the list you want, and extract it.

For example

library(RDSTK)
library(jsonlite)

js <- coordinates2politics(Coords$Latitude, Coords$Longitude)
lst <- lapply(js, jsonlite::fromJSON)

lst[[1]]$politics
#           type friendly_type                       name  code
# 1       admin2       country                     Canada   can
# 2       admin4         state                    Ontario  ca08
# 3 constituency  constituency            Hamilton Centre 35031
# 4 constituency  constituency                 Burlington 35010
# 5 constituency  constituency Hamilton East-Stoney Creek 35032

To get the data.frame, you can construct another lapply to extract the elements you want, and then put it altogether with either a do.call(..., rbind), or my preferences is with data.table::rbindlist(...)

lst_result <- lapply(lst, function(x){
    df <- x$politics[[1]]
    df$lat <- x$location$latitude
    df$lon <- x$location$longitude
    return(df)
})

data.table::rbindlist(lst_result)

#            type friendly_type                                  name                   code      lat        lon
# 1:       admin2       country                                Canada                    can 43.30528  -79.80306
# 2:       admin4         state                               Ontario                   ca08 43.30528  -79.80306
# 3: constituency  constituency                       Hamilton Centre                  35031 43.30528  -79.80306
# 4: constituency  constituency                            Burlington                  35010 43.30528  -79.80306
# 5: constituency  constituency            Hamilton East-Stoney Creek                  35032 43.30528  -79.80306
# 6:       admin2       country                                Canada                    can 46.08333  -82.41667
# 7:       admin4         state                               Ontario                   ca08 46.08333  -82.41667

Alternatively, to get more detail about each lat/lon you can use Google's API through library(googleway) (Disclaimer: I wrote googleway) to reverse geocode the lat/lons.

For this you need a valid Google API key (which is limited to 2,500 requests per day, unless you pay)

library(googleway)

key <- "your_api_key"

lst <- apply(Coords, 1, function(x){
    google_reverse_geocode(location = c(x["Latitude"], x["Longitude"]),
                           key = key)
})

lst[[1]]$results$address_components
# [[1]]
#                              long_name                           short_name                                  types
# 1 Burlington Bay James N. Allan Skyway Burlington Bay James N. Allan Skyway                                  route
# 2                           Burlington                           Burlington                    locality, political
# 3         Halton Regional Municipality         Halton Regional Municipality administrative_area_level_2, political
# 4                              Ontario                                   ON administrative_area_level_1, political
# 5                               Canada                                   CA                     country, political
# 6                                  L7S                                  L7S        postal_code, postal_code_prefix

Or similarly through library(ggmap), also limited by Google's 2,500 limit.

library(ggmap)

apply(Coords, 1, function(x){
    revgeocode(c(x["Longitude"], x["Latitude"]))
})

# 1 
# "Burlington Bay James N. Allan Skyway, Burlington, ON L7S, Canada" 
# 2 
# "308 Brennan Harbour Rd, Spanish, ON P0P 2A0, Canada" 
# 3 
# "724 Harris Ave, San Diego, CA 92154, USA" 
# 4 
# "30 Cherry St, Chinook, WA 98614, USA" 
# 5 
# "St James Township, MI, USA" 
# 6 
# "US-101, Chinook, WA 98614, USA" 
# 7 
# "2413 II Rd, Garden, MI 49835, USA" 
# 8 
# "2413 II Rd, Garden, MI 49835, USA" 
# 9 
# "8925 S Shore Rd, Stella, ON K0H 2S0, Canada" 
# 10 
# "Charlevoix County, MI, USA"
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
  • I guess my question is unclear. I would like a dataframe output that includes, Lat, Long, Country, State or Province. – Keith W. Larson Jul 06 '16 at 22:26
  • thank you and I see where this is going, but I am still stuck because your solution gives me four records for each original observation. Still trying to wrap my head around 'rbindlist' and how it chooses to arrange the data. – Keith W. Larson Jul 06 '16 at 22:54
  • @KeithLarson the "4 records" are because the `?coordinates2politics` function returns "the countries, states, provinces, cities, constituencies and neighbourhoods that the latitude and longitude point lies within". So you get multiple data points for each coordinate. – SymbolixAU Jul 06 '16 at 23:00
  • 1
    @KeithLarson - I've updated with another option using Google's API. – SymbolixAU Jul 06 '16 at 23:44
1

That json-list needs to be extracted. You really only have a result from your first coordinate:

sapply(json_file[[1]]$politics, "[[", 'name')[ # now pick correct names with logical
        sapply(json_file[[1]]$politics, "[[", 'friendly_type') %in% c("country","state") ] 
[1] "Canada"  "Ontario"

You should have used apply to run all the coordinates one-by-one through the fromJSON(coordinates2politics( .,.) extraction since the function appears not to be "vectorized".

res=apply( Coords, 1, function(x) {fromJSON(coordinates2politics(x['Latitude'], 
                                                                 x['Longitude']) )} )
sapply( res, function(x) sapply(x[[1]]$politics, "[[", 'name')[
                             sapply(x[[1]]$politics, "[[", 'friendly_type') %in% 
                                                                c("country","state")] )
$`1`
[1] "Canada"  "Ontario"

$`2`
[1] "Canada"  "Ontario"

$`3`
[1] "United States" "California"    "Mexico"        "California"   

$`4`
[1] "United States"

$`5`
[1] "United States" "Michigan"     

$`6`
[1] "United States" "Washington"   

$`7`
[1] "United States" "Michigan"     

$`8`
[1] "United States" "Michigan"     

$`9`
[1] "Canada"  "Ontario"

$`10`
[1] "United States" "Michigan" 

Apparently items near the border (like San Diego County or Chula Vista) will give ambiguous results.

IRTFM
  • 258,963
  • 21
  • 364
  • 487