I have a nested list from reading a JSON that stores logging info from a video game. The time element of the list is a simple vector, while inputManagerStates and syncedProperties are lists that may contain 0 or more elements.
This is a follow-up on THIS question, where with some help, I managed to get the data into rectangular format. Unfortunately, I have a lot of such JSON files and unnest_wider
seems to run quite slowly.
The list:
test_list <-
list(list(time = 9.92405605316162, inputManagerStates = list(),
syncedProperties = list()), list(time = 9.9399995803833,
inputManagerStates = list(list(inputId = "InputY", buttonState = FALSE,
axisValue = 0), list(inputId = "InputX", buttonState = FALSE,
axisValue = 0.0501395985484123), list(inputId = "xPos",
buttonState = FALSE, axisValue = 5), list(inputId = "yPos",
buttonState = FALSE, axisValue = 0.0799999982118607),
list(inputId = "zPos", buttonState = FALSE, axisValue = 0),
list(inputId = "xRot", buttonState = FALSE, axisValue = 0),
list(inputId = "yRot", buttonState = FALSE, axisValue = -0.70664256811142),
list(inputId = "zRot", buttonState = FALSE, axisValue = 0),
list(inputId = "wRot", buttonState = FALSE, axisValue = 0.707570731639862)),
syncedProperties = list(list(name = "timeStamp", value = "97,2"))),
list(time = 9.95659446716309, inputManagerStates = list(list(
inputId = "InputY", buttonState = FALSE, axisValue = 0),
list(inputId = "InputX", buttonState = FALSE, axisValue = 0.0993990004062653),
list(inputId = "xPos", buttonState = FALSE, axisValue = 5),
list(inputId = "yPos", buttonState = FALSE, axisValue = 0.0799999982118607),
list(inputId = "zPos", buttonState = FALSE, axisValue = 0),
list(inputId = "xRot", buttonState = FALSE, axisValue = 0),
list(inputId = "yRot", buttonState = FALSE, axisValue = -0.705721318721771),
list(inputId = "zRot", buttonState = FALSE, axisValue = 0),
list(inputId = "wRot", buttonState = FALSE, axisValue = 0.708489596843719)),
syncedProperties = list(list(name = "timeStamp", value = "97,21667"))),
list(time = 20.0626411437988, inputManagerStates = list(list(
inputId = "InputY", buttonState = FALSE, axisValue = 0.601816594600677),
list(inputId = "InputX", buttonState = FALSE, axisValue = 0),
list(inputId = "xPos", buttonState = FALSE, axisValue = -1.31777036190033),
list(inputId = "yPos", buttonState = FALSE, axisValue = 0.0800001174211502),
list(inputId = "zPos", buttonState = FALSE, axisValue = 6.08214092254639),
list(inputId = "xRot", buttonState = FALSE, axisValue = 0),
list(inputId = "yRot", buttonState = FALSE, axisValue = -0.391442984342575),
list(inputId = "zRot", buttonState = FALSE, axisValue = 0),
list(inputId = "wRot", buttonState = FALSE, axisValue = 0.920202374458313)),
syncedProperties = list(list(name = "timeStamp", value = "107,3167"),
list(name = "previousGameState", value = "1"), list(
name = "newGameState", value = "2"))))
Code I am using to rectangularize the list:
library(tidyverse)
output_df <-
test_list %>%
tibble::enframe(name = "frame", value = "value") %>%
tidyr::unnest_wider(value) %>%
tidyr::unnest(inputManagerStates, keep_empty = TRUE) %>%
tidyr::unnest(syncedProperties, keep_empty = TRUE) %>%
tidyr::unnest_wider(syncedProperties) %>%
tidyr::unnest_wider(inputManagerStates)
output_df
#> # A tibble: 46 x 7
#> frame time inputId buttonState axisValue name value
#> <int> <dbl> <chr> <lgl> <dbl> <chr> <chr>
#> 1 1 9.92 <NA> NA NA <NA> <NA>
#> 2 2 9.94 InputY FALSE 0 timeStamp 97,2
#> 3 2 9.94 InputX FALSE 0.0501 timeStamp 97,2
#> 4 2 9.94 xPos FALSE 5 timeStamp 97,2
#> 5 2 9.94 yPos FALSE 0.0800 timeStamp 97,2
#> 6 2 9.94 zPos FALSE 0 timeStamp 97,2
#> 7 2 9.94 xRot FALSE 0 timeStamp 97,2
#> 8 2 9.94 yRot FALSE -0.707 timeStamp 97,2
#> 9 2 9.94 zRot FALSE 0 timeStamp 97,2
#> 10 2 9.94 wRot FALSE 0.708 timeStamp 97,2
#> # ... with 36 more rows
Created on 2022-08-24 with reprex v2.0.2
For my data unnest
is fairly fast, but unnest_wider
is quite slow. The first unnest_wider(value)
can be easily written in base R - cbind(., do.call("rbind", .$value))
- and is much faster:
microbenchmark::microbenchmark(
unnest_wider =
test_list %>%
tibble::enframe(name = "frame", value = "value") %>%
tidyr::unnest_wider(value),
baser_r =
test_list %>%
tibble::enframe(name = "frame", value = "value") %>%
cbind(., do.call("rbind", .$value)) %>%
select(-value)
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> unnest_wider 3.1446 3.34645 4.031113 3.63625 4.22770 10.5289 100 b
#> baser_r 1.4005 1.48225 1.770210 1.63475 1.86465 5.0407 100 a
Created on 2022-08-24 with reprex v2.0.2
I am looking for a way to replace %>% tidyr::unnest_wider(syncedProperties) %>% tidyr::unnest_wider(inputManagerStates)
with faster code but the cbind
solution doesn't work because of different number of rows.
EDIT: Think this may be possible with unnest::unnest()
but couldn't achieve the desired structure with it (while tidytable::unnest_wider.
currently supports only vectors).