In R, I am trying to create a month column to plot my data with by summing over another column that has the same value for each population I am working with, ex:
NAME ORIG_ROW MONTH
POP1 1 1
POP1 1 2
POP1 1 3
POP2 2 1
POP2 2 2
POP2 2 3
I am able to do this with:
df$MONTH <- sapply(1:nrow(df), function(i) (colSums(df[0:i, c('ORIG_ROW') == df$ORIG_ROW[i]))
However, this code is inefficient when I try to apply it to a large dataset (~825k observations).
Does anyone have suggestions on how to make this code more efficient?