I am playing with the Kaggle Star Trek Scripts dataset but I am struggling with converting from json to a dataframe in R. Ideally I would convert it in a long form dataset with index columns for episodes and characters with their lines on individual rows. I did find this answer, however it is not in R.
Currently the json looks like the photo below. Sorry it is not a full exmaple, but I put a small mocked version below as well. If you want you can download the data yourselves from here.
Mock Example
"ENT": {
"episode_0": {
"KLAANG": {
"\"Pungghap! Pung ghap!\"": {},
"\"DujDajHegh!\"": {}
}
},
"eipsode_1": {
"ARCHER": {
"\"Warpme!\"": {},
"\"Toboldly go!\"": {}
}
}
}
}
The issue I have is that the second level, epsiodes, are individually numbered. So my regular bag of tricks for flattening by attribute name are not working. I am unsure how to loop through a level rather than an attribute name.
What I would ideally want is a long form data set that looks like this:
Series Episode Character Lines
ENT episode_0 KLAANG Pung ghap! Pung ghap!
ENT episode_0 KLAANG DujDaj Hegh!
ENT episode_1 ARCHER Warp me!
ENT episode_1 ARCHER To boldly go!
My currnet code looks like the below, which is what I would normally start with, but is obviously not working or far enough.
your_df <- result[["ENT"]] %>%
purrr::flatten() %>%
map_if(is_list, as_tibble) %>%
map_if(is_tibble, list) %>%
bind_cols()
I have also tried using stack()
and map_dfr()
but with no success. So I yet again come humbly to you, dear reader, for expertise. Json is the bane of my existance. I struggle with applying other answers to my circumstance so any advice or examples I can reverse engineer and lear from are most appreciated.
Also happy to clarify or expand on anything if possible.
-Jake