I want to manipulate, store and retrieve nested data in R, but to my surprise the nested data frame features a substantial increase in size:
pacman::p_load(dplyr, tidytable)
test3 <- tibble(ID = 1:1e5) %>%
group_by(ID) %>%
summarise(number = 1:sample(1:4, size = 1), .groups = "drop") %>%
mutate(Date = sample(seq.Date(from = as.Date("2021-01-01"),
to = as.Date("2021-12-31"), by = 1),
size = n(), replace = TRUE))
test4 <- test3 %>% nest_by(ID)
prettyNum(object.size(test3), big.mark = ",") 4 MB
prettyNum(object.size(test4), big.mark = ",") 132 MB
The same issue exists with tidytable
.
Nesting of data is a cool idea because it helps to control problems of data duplication if data is not two-dimensional.
But that memory increase is problematic.
Furthermore, write_fst
refuses to write data if there are nested columns, so I may need a different solution here as well.
Do you have any suggestions?