Read JSONs in R to data.frame

Question

I have list of JSON values (actually it's a text file where every line is one JSON object). Like this:

{ "id": 1, "name": "john", "age": 18, "education": "master" }
{ "id": 2, "name": "jack", "job": "clerk" }
...

Some of the values can be missing (e.g. first item doesn't have "job" value and second item doesn't have "education" and "age").

I need to create data frame in R and fill all missing column values as NAs (if field with unique name exists in at least one row). How to achieve this easier?

What I already done - I installed "rjson" package and parsed these lines to R lists. Let's assume that lines variable is a character vector of lines.

library(rjson)
lines <- // initialize "lines" var here
jsons <- sapply(lines, fromJSON)

"jsons" variable became "list of lists" (every JSON object is converted to list in R terminology). How to convert it to data.frame?

I want to see the following data frame for the example I provided:

"id" | "name" | "age" | "education" | "job"
-------------------------------------------
1    | "john" |  18   |  "master"   |   NA
2    | "jack  |  NA   |     NA      | "clerk"

score 3 · Accepted Answer · answered Jul 22 '15 at 22:25

3

From plyr you can use rbind.fill to add the NAs for you

library(plyr)
rbind.fill(sapply(jsons, data.frame), jsons)

#   id name age education   job
# 1  1 john  18    master  <NA>
# 2  2 jack  NA      <NA> clerk

or from data.table

library(data.table)
rbindlist(jsons, fill=T)

and dplyr

library(dplyr)
bind_rows(sapply(jsons, data.frame))

answered Jul 22 '15 at 22:25

Rorschach

31,301
5
78
129

There is one issue, though. If any of the field in JSON has explicit NULL value, this code isn't working with error message "arguments imply differing number of rows: 1, 0" – pkozlov Jul 23 '15 at 14:51
@pkozlov - again, I'll point out that `jsonlite` deals with this issue with robust checking, and will just read `null` as final `NA` values in a data.frame. – thelatemail Jul 23 '15 at 23:36

thelatemail · Answer 2 · 2018-09-02T23:58:10.403

Future me, correcting past me's mistakes. It would make more sense to use jsonlite's stream_in

stream_in(txtfile)

# To test on `txt` from below, try:
# stream_in(textConnection(txt))

# Found 2 records...
# Imported 2 records. Simplifying...
#  id name age education   job
#1 NA john  18    master  <NA>
#2  2 jack  NA      <NA> clerk

Use the jsonlite package's fromJSON function, after making a few inline edits to your original text data (I've also edited the first piece of id data to include an explicit null value, to show that it deals with this):

fromJSON(paste0("[", gsub("}\n", "},\n", txt), "]"))
#  id name age education   job
#1 NA john  18    master  <NA>
#2  2 jack  NA      <NA> clerk

All I did was add a little formatting to wrap all the JSON lines together in [ and ] and add a comma at the end of each closing } - resulting in an output like the below which can be processed all at once by jsonlite::fromJSON:

[{"1":"one"},{"2":"two"}]

Where txt was your lines of data as presented, with a null in the id variable:

txt <- "{ \"id\": null, \"name\": \"john\", \"age\": 18, \"education\": \"master\" }
{ \"id\": 2, \"name\": \"jack\", \"job\": \"clerk\" }"

I don't like this option. The reason why this text file has one JSON object per line is that file can be very big - and reading it as one big JSON array can be memory- and time consuming. It's better to process each line independently, can be even done in parallel, I think. Thank you for the response anyway. — pkozlov, Jul 23 '15 at 10:32
@pkozlov - if you're using `sapply` to loop over each line in R, you'll have to load whatever part you are operating on into R's memory anyway. There's nothing stopping you from `scan`ning in 10 thousand lines at a time, and then processing/parallel processing using this method. Also, jsonlite is better scaled for large data as per: http://stackoverflow.com/a/2062865/496803 — thelatemail, Jul 23 '15 at 10:41

Read JSONs in R to data.frame

2 Answers2

Linked