I currently have a large data table and I would like to accumulate a vector column (the classes
column) for each group (id) along the years to get all past classes up to the current year in vector format.
EDIT: Previous topics (ie Cumulatively paste (concatenate) values grouped by another variable) have answerd this question in the case of characters concatenation which I don't want (because analyzing strings forces me to parse it before, which is cumputer intensive on large datasets). I would like to accumulate the vectors and get a column of vectors as well. I think the solution is pretty close but I just can't manage to find the right syntax for it.
Sample data:
id year classes
----------------------------
1 2000 c("A", "B")
1 2001 c("C", "A")
1 2002 "D"
1 2003 "E"
2 2001 "A"
2 2002 c("A", "D")
2 2003 "E"
...
Expected output :
id year classes cumclasses
-----------------------------------------------------------
1 2000 c("A", "B") c("A", "B")
1 2001 c("C", "A") c("A", "B", "C", "A")
1 2002 "D" c("A", "B", "C", "A", "D")
1 2003 "E" c("A", "B", "C", "A", "D", "E")
2 2001 "A" "A"
2 2002 c("A", "D") c("A", "A", "D")
2 2003 "E" c("A", "A", "D", "E")
...
My goal is to find an efficient solution because my dataset is fairly large.
For now I have a working (but ultra slow) solution using dplyr and purrr :
dt2 <- dt %>%
setkeyv(c("id", "year")) %>%
group_by(id) %>%
mutate(cumclasses = accumulate(classes, append))
I'm looking for a data.table solution of the type:
#not working example
dt2 <- dt[, cumclasses := accumulate(classes, append), by = id]
or even a base R solution, the faster the better !
Thank you!
If you want to reproduce sample data please copy the following code:
dt <- data.table(id = c(1,1,1,1,2,2,2), year = c(2000,2001,2002,2003,2001,2002,2003), classes = list(c('A', 'B'), c('C', 'A'), 'D', 'E', 'A', c('A', 'D'), 'E'), key = 'id')
EDIT [SOLVED]:
A working solution is (using data.table
and purrr
):
dt[, cumClasses := list(accumulate(classes, append)), by = id]