0

I need to create the sum of next 7 days values of a column in r.The sum should be grouped by another column which has string values

Example

name   value 
a       2    
a       3  
a       3  
b       4  
b       3  
b       2  
b       1  
b       3  

sum by 2 next rows

output

sum
5
6
3
7
5
3     
4
3
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
Sowmya
  • 13
  • 1
  • 1
    Hello @sowmya, wellcome to SO. Check this topic to see how to make the sum: https://stackoverflow.com/questions/19200841/consecutive-rolling-sums-in-a-vector-in-r; and this one on how to make operations by group: https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group. If you still cannot solve your problem, edit your question to include your attempts. Also check the site guide on how to make good questions: https://stackoverflow.com/help/how-to-ask. – Carlos Eduardo Lagosta Oct 10 '18 at 14:35

2 Answers2

1

You can use lead() and lag() to reference the next and prior values.

This code sums the current and the next, grouped by the string values:

library(dplyr)

df <- data.frame(stringsAsFactors=FALSE,
          V1 = c("a", "a", "a", "b", "b", "b", "b", "b"),
          V2 = c(2L, 3L, 3L, 4L, 3L, 2L, 1L, 3L)
)

df

df %>% 
  group_by(V1) %>% 
  mutate(sum_forward = dplyr::lead(V2) + V2)

And this is the output. The NAs are there because on the last day, there is no next day to sum with.

  V1       V2 sum_forward
  <chr> <int>       <int>
1 a         2           5
2 a         3           6
3 a         3          NA
4 b         4           7
5 b         3           5
6 b         2           3
7 b         1           4
8 b         3          NA
Jeremy K.
  • 1,710
  • 14
  • 35
1

The zoo package is esp designed for such tasks.

library(zoo)

df1$new <- unlist(tapply(df1$value, factor(df1$name), function(x){ zoo::rollsum(x, 2, align = "left", fill = x[length(x)]) }))

#> df1$new
#[1] 5 6 3 7 5 3 4 3

df1 <- data.frame(stringsAsFactors=FALSE,
                  name = c("a", "a", "a", "b", "b", "b", "b", "b","c","d","d","d"),
                  value = c(2L, 3L, 3L, 4L, 3L, 2L, 1L, 3L, 4L, 1L:3L)
)

windowSize = 3

df1$new <- unlist(
    tapply(df1$value, factor(df1$name),function(x){
        IND <- (length(x)-(windowSize-2)):length(x);IND = IND[IND > 0]
        c(  zoo::rollsum(x, windowSize, align = "left"), rev(cumsum(rev(x[IND])))  )})
    )

This was a little bit tricky to do:

Here is the formula in respect to a given windowSize.

Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
  • Thanks!Can you modify the function such that if there is only one row in a group, that value is summed? Instead of NA or 0 – Sowmya Oct 10 '18 at 15:29
  • The code does not give output 5 6 3, it seems to give 5 6 0 in my data.Can you help – Sowmya Oct 10 '18 at 15:44
  • 1.8.4. I am summing consecutive 7 rows, and i am getting 0s wherever the group has only 6 rows remaining. – Sowmya Oct 10 '18 at 18:46
  • df1$new <- unlist(tapply(df1$value, factor(df1$name),function(x){ zoo::rollsum(x, 3, align = "left", fill = x[length(x)]) })) Output is new 8 3 3 9 6 6 3 3 4 6 3 3 second row should be 6 – Sowmya Oct 11 '18 at 05:53
  • yes, it should compute available values, i.e if there are 2 out of 3 rows in the window present it should sum them up – Sowmya Oct 11 '18 at 07:34