1

I want to convert a json-file into a dataframe in R. With the following code:

link <- 'https://www.dropbox.com/s/ckfn1fpkcix1ccu/bevingenbag.json'
document <- fromJSON(file = link, method = 'C')
bev <- do.call("cbind", document)

i'm getting this:

    type                features
1   FeatureCollection   list(type = "Feature", geometry = list(type = "Point", coordinates = c(6.54800000288927, 52.9920000044505)), properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751"))

which is the first row of a matrix. All the other rows have the same structure. I'm interested in the properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751") part, which should be converted into a dataframe with the columns gid, yymmdd, lat, lon, mag, depth, knmilocatie, baglocatie, tijd.

I searched for and tryed several solutions but none of them worked. I used the rjson package for this. I also tryed the RJSONIO & jsonlite package, but was unable to extract the desired information.

Anyone an idea how to solve this problem?

Jaap
  • 81,064
  • 34
  • 182
  • 193

2 Answers2

4

Here's a way to obtain the data frame:

library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')

dat <- do.call(rbind, lapply(document$features, 
                             function(x) data.frame(x$properties)))

Edit: How to replace empty values with NA:

dat$baglocatie[dat$baglocatie == ""] <- NA

The result:

head(dat)

      gid   yymmdd    lat   lon mag depth knmilocatie baglocatie   tijd
1 1496600 19861226 52.992 6.548 2.8   1.0       Assen      Assen  74751
2 1496601 19871214 52.928 6.552 2.5   1.5   Hooghalen  Hooghalen 204951
3 1496602 19891201 52.529 4.971 2.7   1.2   Purmerend    Kwadijk 200914
4 1496603 19910215 52.771 6.914 2.2   3.0       Emmen      Emmen  21116
5 1496604 19910425 52.952 6.575 2.6   3.0   Geelbroek    Ekehaar 102631
6 1496605 19910808 52.965 6.573 2.7   3.0     Eleveld      Assen  40114
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • Thanx, it works :-) One more question: some of the values in the column `baglocatie` are empty. How do I replace them with NA's or the value from `knmilocatie`? – Jaap Jan 14 '14 at 20:56
4

This is just another, quite similar, approach.

@SvenHohenstein's approach creates a dataframe at each step, an expensive process. It's much faster to create vectors and re-type the whole result at the end. Also, Sven's approach makes each column a factor, which might or might not be what you want. The approach below runs about 200 times faster. This can be important if you intend to do this repeatedly. Finally, you will need to convert columns lon, lat, mag, and depth to numeric.

library(microbenchmark)
library(rjson)

document <- fromJSON(file = "bevingenbag.json", method = 'C')

json2df.1 <- function(json){   # @SvenHohenstein approach
  df <- do.call(rbind, lapply(json$features, 
                       function(x) data.frame(x$properties, stringsAsFactors=F)))
  return(df)
}

json2df.2 <- function(json){
  df <- do.call(rbind,lapply(json[["features"]],function(x){c(x$properties)}))
  df <- data.frame(apply(result,2,as.character), stringsAsFactors=F)
  return(df)
}

microbenchmark(x<-json2df.1(document), y<-json2df.2(document), times=10)
# Unit: milliseconds
#                     expr        min         lq     median         uq        max neval
#  x <- json2df.1(document) 2304.34378 2654.95927 2822.73224 2977.75666 3227.30996    10
#  y <- json2df.2(document)   13.44385   15.27091   16.78201   18.53474   19.70797    10
identical(x,y)
# [1] TRUE
jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • 1
    why there is no function similar to XMLtoDataFrame for JSON (just like when XML is parsed)? – userJT Aug 14 '14 at 06:08