How do I make a cumulative sum in R with multiple columns?

Question

I have a large time-series data set with 8 key data variables. I am basically looking for the same data for the year before and creating a cumulative sum.

I have a code that works but it is incredibly slow, is there a more efficient way to

My current code works it is slow:

EDIT: My code wasn't reproducible. I am sorry for that. After more effort than I am proud of it should be now.

Source data

Data <- data.frame(
  V1 = rep(1:50, 25*3*20),
  V2 = rep(1:25, each= 50, 3*20),
  V3 = rep(1:3, each=50*25, 20),
  Year = rep(1:20, each=50*25*3),
  YearTotal = rnorm(75000), 
  CumulativeSum = 0)

Working but very slow code

for (i in 1:75000){
  Data$CumulativeSum[i] <- Data$YearTotal[i] +
    ifelse(Data$Year[i] == 1, 0, 
           Data$CumulativeSum[ 
           Data$V1 == Data$V1[i] &
               Data$V2 == Data$V2[i] &
               Data$V3 == Data$V3[i] &
               Data$Year + 1 == Data$Year[i] ])  
}

On my actual code which is quite a bit longer, and has character and period variables, takes me over 30 minutes to calculate. I couldn't find any similar problems on StackOverflow - maybe I didn't know what to search. I can't help but believe there must be a more elegant and fast way to get it done.

Could you please provide a reproducible example? See for example [here](https://stackoverflow.com/a/5963610/5892059) — kath, Jul 25 '19 at 07:53

kath · Accepted Answer · 2019-07-25T09:02:08.397

You can use dplyr to speed up your code:

library(dplyr)

Data <- Data %>% 
  group_by(V1, V2, V3) %>% 
  arrange(Year) %>% 
  mutate(CumulativeSum = cumsum(YearTotal))

By grouping by V1, V2, and V3 we ensure that we only use the same combinations. Then we arrange by Year and take te cumulative sum.

Or with data.table:

library(data.table)

setDT(Data)
Data[order(Year), CumulativeSum := cumsum(YearTotal), by = .(V1, V2, V3)]

How do I make a cumulative sum in R with multiple columns?

1 Answers1