Cumulative summing between groups using dplyr

Question

I have a tibble structured as follows:

   day  theta
1   1    2.1
2   1    2.1
3   2    3.2
4   2    3.2
5   5    9.5
6   5    9.5
7   5    9.5

Note that the tibble contains multiple rows for each day, and for each day the same value for theta is repeated an arbitrary number of times. (The tibble contains other arbitrary columns necessitating this repeating structure.)

I'd like to use dplyr to cumulatively sum values for theta across days such that, in the example above, 2.1 is added only a single time to 3.2, etc. The tibble would be mutated so as to append the new cumulative sum (c.theta) as follows:

   day  theta  c.theta
1   1    2.1     2.1
2   1    2.1     2.1
3   2    3.2     5.3
4   2    3.2     5.3
5   5    9.5     14.8
6   5    9.5     14.8
7   5    9.5     14.8 
...

My initial efforts to group_by day and then cumsum over theta resulted only in cumulative summing over the full set of data (e.g., 2.1 + 2.1 + 3.2 ...) which is undesirable. In my Stack Overflow searches, I can find many examples of cumulative summing within groups, but never between groups, as I describe above. Nudges in the right direction would be much appreciated.

score 3 · Accepted Answer · answered Oct 24 '17 at 23:25

3

Doing this in dplyr I came up with a very similar solution to PoGibas - use distinct to just get one row per day, find the sum and merge back in:

df = read.table(text="day  theta
1   1    2.1
2   1    2.1
3   2    3.2
4   2    3.2
5   5    9.5
6   5    9.5
7   5    9.5", header = TRUE)

cumsums = df %>%
    distinct(day, theta) %>%
    mutate(ctheta = cumsum(theta))

df %>%
    left_join(cumsums %>% select(day, ctheta), by = 'day')

answered Oct 24 '17 at 23:25

Marius

58,213
16
107
105

Excellent! The `distinct` function will come in handy. – mduprey Oct 24 '17 at 23:42

pogibas · Answer 2 · 2017-10-24T23:26:02.177

Not a dplyr, but just an alternative data.table solution:

library(data.table)
# Original table is called d
setDT(d)
merge(d, unique(d)[, .(c.theta = cumsum(theta), day)])

   day theta c.theta
1:   1   2.1     2.1
2:   1   2.1     2.1
3:   2   3.2     5.3
4:   2   3.2     5.3
5:   5   9.5    14.8
6:   5   9.5    14.8
7:   5   9.5    14.8

PS: If you want to preserve other columns you have to use unique(d[, .(day, theta)])

lmo · Answer 3 · 2017-10-25T18:39:13.170

In base R you could use split<- and tapply to return the desired result.

# construct 0 vector to fill in
dat$temp <- 0
# fill in with cumulative sum for each day
split(dat$temp, dat$day) <- cumsum(tapply(dat$theta, dat$day, head, 1))

Here, tapply returns the first element of theta for each day which is is fed to cumsum. The elements of cumulative sum are fed to each day using split<-.

This returns

dat
  day theta temp
1   1   2.1  2.1
2   1   2.1  2.1
3   2   3.2  5.3
4   2   3.2  5.3
5   5   9.5 14.8
6   5   9.5 14.8
7   5   9.5 14.8

Cumulative summing between groups using dplyr

3 Answers3