I have a data.table where one of the columns contains JSON. I am trying to extract the content so that each variable is a column.
library(jsonlite)
library(data.table)
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"tag_id":"4","response_id":1,"other":4}',
'{"tag_id":"34"}'),stringsAsFactors=F)
The desired result, that does not refer to the "other" variable:
tag_id response_id
1 "34" 2
2 "4" 1
3 "34" NA
I have tried several versions of:
parseLog <- function(x){
if (is.na(x))
e=c(tag_id=NA,response_id=NA)
else{
j=fromJSON(x)
e=c(tag_id=as.integer(j$tag_id),response_id=j$response_id)
}
e
}
that seems to work well to retrieve a list of vectors (or lists if c is replaced by list) but when I try to convert the list to data.table something doesn´t work as expected.
parsed<-lapply(df$a,parseLog)
rparsed<-do.call(rbind.data.frame,parsed)
colnames(rparsed)<-c("tag_id","response_id")
Because of the missing value in the third row. How can I solve it in a R-ish clean way? How can I make that my parse method returns an NA for the missing value. Alternative, Is there a parameter "fill" like there is for rbind that can be used in rbind.data.frame or analogous method?
The dataset I am using has 11M rows so performance is important.
Additionally, there is an equivalent method to rbind.data.frame to obtain a data.table. How would that be used? When I check the documentation it refers me to rbindlist but it complains the parameter is not used and if call directly(without do.call it complains about the type of parsed):
rparsed<-do.call(rbindlist,fill=T,parsed)
EDIT: The case I need to cover is more general, in a set of 11M records all the possible circumstances happen:
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"trash":"34","useless":2}',
'{"tag_id":"4","response_id":1,"other":4}',
NA,
'{"response_id":"34"}',
'{"tag_id":"34"}'),stringsAsFactors=F)
and the output should only contain tag_id and response_id columns.