I'm trying to convert a nested json file to a data frame in R using the following function:
rf1 <- function(data) {
master <-
data.frame(
id = character(0),
awardAmount = character(0),
awardStatus = character(0),
tenderAmount = character(0)
)
for (i in 1:nrow(data)) {
temp1 <- unlist(data$data$awards[[i]]$status)
length <- length(temp1)
temp2 <- rep(data$data$id[i], length)
temp3 <- rep(data$data$value$amount[[i]], length)
temp4 <- unlist(data$data$awards[[i]]$value[[1]])
tempDF <-
data.frame(id = temp2,
awardAmount = temp4,
awardStatus = temp1,
tenderAmount = temp3)
master <- rbind(master, tempDF)
}
return(master)
}
Here's an example of the json files I'm using:
{
"data" : {
"id" : "3f066cdd81cf4944b42230ed56a35bce",
"awards" : [
{
"status" : "unsuccessful",
"value" : {
"amount" : 76
}
},
{
"status" : "active",
"value" : {
"amount" : 41220
}
}
],
"value" : {
"amount" : 48000
}
}
},
{
"data" : {
"id" : "9507162e6ee24cef8e0ea75d46a81a30",
"awards" : [
{
"status" : "active",
"value" : {
"amount" : 2650
}
}
],
"value" : {
"amount" : 2650
}
}
},
{
"data" : {
"id" : "a516ac43240c4ec689f3392cf0c17575",
"awards" : [
{
"status" : "active",
"value" : {
"amount" : 2620
}
}
],
"value" : {
"amount" : 2650
}
}
}
As you can see, the three observations have different number of awards (the first observation has two awards while the other two have only one). Since I'm looking for a table-view data frame, I'm filling the empty cells with repetitive information such as data$id
and data$value$amount
.
The json file has approximately 100,000 observations, so it takes forever to return a data frame (I've been waiting for more than 30 minutes and still no result). I think that there might be a way to run all the temp
lines in parallel, which should save a lot of time, but I'm not sure how to implement that in my code.
To give you a sense of the output I'm looking for, I limited my function to for (i in 1:3)
, which produced the following data frame. My question is how to do the same thing but for 100,000 observations. Note, the json example corresponds to the sample output.
Desired output: