I've been trying to solve this issue for too long now. I have binary insect outbreak data in annual time series format for 300+ years (rows) and 70+ trees (columns).
I'd like to conditionally fill a dataframe / matrix / data table of the same dimensions with cumulative sums, and have it reset to 0 at the end of each outbreak period. I've found a wealth of similar questions / answers that I just can't seem to translate to my issue.
I'll have a snippet of a dataframe, e.g., that looks like this:
t1 t2 t3 t4 t5
2000 1 0 0 1 0
2001 1 0 0 0 1
2002 1 1 0 0 1
2003 0 1 0 1 1
2004 1 1 1 1 1
And I want to create a new df that looks like this:
t1 t2 t3 t4 t5
2000 1 0 0 1 0
2001 2 0 0 0 1
2002 3 1 0 0 2
2003 0 2 0 1 3
2004 1 3 1 2 4
I've felt I've gotten close with both the data.table
and rle
packages, although I've also been going in tons of circles as well (pretty sure I did it for a single column once, but now can't remember what I did, or why I couldn't get it to work in a loop for all columns...).
I've always gotten the following methods to work to some extent, usually just a single column, or add one 1 df on top of a shifted df, so a single column might look like 0 1 2 2 1 0
instead of 0 1 2 3 4 0
. Some attempts, if this helps, have been variations on code looking like this:
setDT(dt)[, new := t1 + shift(t1, fill = 0)]
apply(
rle( matrix)$lengths
, 2, seq)
rle( matrix[,1])$lengths
for( i in 1:dim(dt)[1]) {
for( j in 1:dim(dt)[2]) {
cols <- names(dt) # tried in place of .SD with negative results
if( dt[i,j] == 1) {
dt[, new := .SD + shift(.SD, 1L, fill = 0, type = "lag", give.names = TRUE)]
} else { dt }
}
}
Some of the main SO sources I've used include these pages: data.table, dplyr, rle
Let me know if I'm missing any important info (I'm new!). & thank you so much for any help!