0

I have a particular Json file about data. I'd like to convert it to a Data Frame and apply some transformation. Here is the Json:

{
  "OuterVariable1": [
    {
      "InnerData": "car",
      "InnerList": {
        "wheels": {
          "cost-taxed": 231,
          "cost": 850
        },
        "engine": {
          "cost-taxed": 108,
          "cost": 286
        }
      }
    },
    {
      "InnerData": "van",
      "InnerList": {
        "frame": {
          "cost-taxed": 302,
          "cost": 250
        }
      }
    }, 
   ],

  "OuterVariable2": [
     {
      "InnerData": "truck",
      "InnerList": {
        "wheels": {
          "cost": 1400
        },
      }
    },
  ],
   ...
}

Here is what I want. Notice how OuterVariable is a new column, and we separate the entries for each InnerList.

OuterVariable InnerData InnerList Cost
OuterVariable1 "car" "wheels" 850
OuterVariable1 "car" "engine" 286
OuterVariable1 "van" "frame" 250
OuterVariable2 "truck" "wheels" 1400

How can I achieve this? I've used jsonlite to get an initial DataFrame and tried to use lapply to do some transformation with no luck.

BeLambda
  • 887
  • 1
  • 10
  • 19

1 Answers1

1

I solved it using the rjson package rather than jsonlite since the latter seems to treat the year property independently from the rest of the data for some reason.

I used a nested lapply approach, not sure how it performs on larger JSON data but it does the job:

library(rjson)
l <- fromJSON(json)

# Loop across each state (use index so we can keep track of the state name)
df_list <- lapply(seq_along(l),
                  function(i) {
                    # inner loop over every year entry
                    inner_list <- lapply(l[[i]], function(y) {
                      data.frame(OuterVariable = names(l)[i],
                                 InnerData = y$InnerData,
                                 InnerList = names(y$InnerList),
                                 Cost = sapply(y$InnerList, "[[", 1),
                    })
                    
                    # Bind together each "mini data.frame" to one per outerVariable
                    do.call("rbind", c(inner_list, make.row.names = FALSE))
                  })

# Bind together the per-outerVariable data.frames
df_result <- do.call("rbind", df_list)

Where json is the sample data you provided above.

BeLambda
  • 887
  • 1
  • 10
  • 19