Manipulation of JSON string in R before conversion to data.frame

Question

I have a url which updates periodically with some JSON data which I would like to convert to a datatable in R with the following code

library(jsonlite)
fromJSON("http://...")

however this is not working and I think it is due to the way the JSON is structured. To my understanding the file is currently structured as following.

{"h1":[{"h2":[{"Name":"Column1Header","Value":"Row1Column1Value"},{"Name":"Column2Header","Value":"Row1Column2Value"}]},{"h2":[{"Name":"Column1Header","Value":"Row2Column1Value"},{"Name":"Column2Header","Value":"Row2Column2Value"}]}]}

I think if I could read in the url as a long string and manipulate it to something like what is below and call the function

fromJSON()

I will be able to obtain the data.table I want.

[{"Column1Header":"Row1Column1Value","Column2Header":"Row1Column2Value"},{"Column1Header":"Row2Column1Value","Column2Header":"Row2Column2Value"}]

Any idea how I can achieve this? My attempt to solve this involved using the readLines() function and using gsub() to replace the bits I don't need. However, readLines is placing "\" all through the data which I am having all sorts of trouble removing and even if I get past that my gsub approach wouldn't be very robust.

Any help would be much appreciated as reading the file as it is now isn't letting me get into the level of detail I require to extract the "Name,Value" pairs which I require to build my data.table.

p.s. Something tells me that for some reason the original JSON file is transposed as the column names don't necessarily comply with the export systems naming convention.

The lack of a URL usually either signifies scraping data from a site that violates ToS or an internal site that can't be mentioned. I ask "which of those two things is it?" knowing that >67% of the time the answer lack veracity but I still hold out hope for honesty. — hrbrmstr, Feb 21 '17 at 03:05
No not at all. The data is project data which is only accessible from the intranet at my workplace. The intention is for me to be able to get an update of this data every 15 minutes and for me to manipulate it and publish to a tableau server for reporting. — Ayelavan, Feb 21 '17 at 03:37

score 0 · Answer 1 · edited May 23 '17 at 12:31

You need to look at this post, the one with 29 upvotes:

A way of unlisting JSON.

Your JSON is valid structurally. So, the problem is getting it unlisted to save in a rectangle when it has non rectangular shape now.

If you can, cut and paste your JSON from above to the validator here: JSON Viewer

...you can see that your H1 covers many frames, the H2 a few frames and they are all nested.

In order to use this in your dataframe, you need to un-nest (unlist the lists of key:value pairs) and assign each layer of data to a column instead.

The post above has a rock solid method for doing so quickly and easily with RJSONIO package and an apply method combined with unlist. You should be able to tweak this to untangle your data!

score 0 · Answer 2 · answered Feb 21 '17 at 05:13

Something like:

library(jsonlite)
library(purrr)

x <- '{"h1":[{"h2":[{"Name":"Column1Header","Value":"Row1Column1Value"},{"Name":"Column2Header","Value":"Row1Column2Value"}]},{"h2":[{"Name":"Column1Header","Value":"Row2Column1Value"},{"Name":"Column2Header","Value":"Row2Column2Value"}]}]}'

res <- fromJSON(x, simplifyVector=FALSE)

map(res$h1, "h2") %>%
  map(unlist) %>%
  map_df(function(x) {
    y <- names(x)
    nam <- which(grepl("Name", y))
    val <- which(!grepl("Name", y))
    setNames(as.list(x[val]), x[nam])
  })
## # A tibble: 2 × 2
##      Column1Header    Column2Header
##              <chr>            <chr>
## 1 Row1Column1Value Row1Column2Value
## 2 Row2Column1Value Row2Column2Value

?

Manipulation of JSON string in R before conversion to data.frame

2 Answers2