0

I have as input a file named in.json. You can find the content of this file here

Using this answer I try to convert json to csv with this code:

require(RJSONIO)
require(rjson)
library("rjson")
filename2 <- "C:/Users/Desktop/in.json"
json_data <- fromJSON(file = filename2)

json_data <- lapply(json_data, function(x) {
  x[sapply(x, is.null)] <- NA
  unlist(x)
})

json <- do.call("rbind", json_data)

df=json


write.csv(df,file='C:/Users/Desktop/final.csv', row.names=FALSE)

However when I type nrow(df) I can see I have only 2 rows but according to every id of project I have to more rows.

Community
  • 1
  • 1
Bil Bal
  • 141
  • 1
  • 2
  • 9

1 Answers1

2

The json you provide as an example indeed has only two objects in an array. The structure is faithfully shown by a called to str:

> str(json_data,max.level=2)
List of 2
 $ :List of 3
  ..$ projects  :List of 1
  ..$ total_hits: num 12596
  ..$ seed      : chr "776766" 
 $ :List of 3
  ..$ projects  :List of 16
  ..$ total_hits: num 12596
  ..$ seed      : chr "776766"

Guessing that you mean project id, and that you don't mind to loose the "total_hits" and you simply need to unlist the first two levels of the json:

 unlisted <- unlist(unlist(json_data,recursive=FALSE),recursive=FALSE)

And then select the items named projects*:

 projects <- unlisted[grep("^projects*",names(unlisted))]

You can then simply unlist using:

data <- lapply(projects,unlist)

Rbinding is more tricky as you do not have exactly the same fields filled in all projects, you need to rely on the names, the following is one of the many solutions, and probably not the optimal one:

# list all the names in all projects
allNames <- unique(unlist(lapply(data,names)))
# have a model row
modelRow <- rep(NA,length(allNames))
names(modelRow)<-allNames

# the function to change your list into a row  following modelRow structure
rowSettingFn <- function(project){
    row <- modelRow
    for(iItem in 1:length(project)){
        row[names(project)[iItem]] <- project[[iItem]]
    }
    return(row)
}

# change your data into a matrix
dataMat <- sapply(data,rowSettingFn)
cmbarbu
  • 4,354
  • 25
  • 45
  • Is it possible to make it to have rows based on the id? Because even if I fix the assignements the final result will be the same 2 rows only? Thank you for your guides. – Bil Bal Mar 25 '15 at 13:56
  • 1
    @user20650 no you are not ... he edited the question – cmbarbu Mar 25 '15 at 14:07
  • thank you for your answer. I will test it and come back to you. Thank you it is very inspired answer! Yeah I don't mind to lose the "total_hits" but I want to keep all the level of json.. – Bil Bal Mar 28 '15 at 20:00