3

I have this table that has wide range of dates and a corresponding value for each one of those dates, an example shown below.

Date        Value   
6/01/2013   8   
6/02/2013   4   
6/03/2013   1   
6/04/2013   7   
6/05/2013   1   
6/06/2013   1   
6/07/2013   3   
6/08/2013   8   
6/09/2013   4   
6/10/2013   2   
6/11/2013   10  
6/12/2013   4   
6/13/2013   7   
6/14/2013   3   
6/15/2013   2   
6/16/2013   1   
6/17/2013   7   
6/18/2013   5   
6/19/2013   1   
6/20/2013   4   

What I am trying to do is create a query that will create a new column that will display the sum of the Value’s column for a specified date range. For example down below, the sum column contains the sum of its corresponding date going back one full week. So the Sum of the date 6/9/2013 would be the sum of the values from 6/03/2013 to 6/09/2013.

Date        Sum
6/01/2013   8
6/02/2013   12
6/03/2013   13
6/04/2013   20
6/05/2013   21
6/06/2013   22
6/07/2013   25
6/08/2013   25
6/09/2013   25
6/10/2013   26
6/11/2013   29
6/12/2013   32
6/13/2013   38
6/14/2013   38
6/15/2013   32
6/16/2013   29
6/17/2013   34
6/18/2013   29 
6/19/2013   26
6/20/2013   23

I’ve tried to using the LIMIT clause but I could not get it to work, any help would be greatly appreciated.

Farrel
  • 10,244
  • 19
  • 61
  • 99
  • 1
    I am trying to do exactly the same with data.table. I can "think" it but I cannot do it. One needs a .SD to chunk each running week but the i argument needs to change for each row. Let me see if can bring the data.table in? – Farrel Jul 13 '14 at 22:42
  • 1
    Here is a ```data.table``` solution http://stackoverflow.com/questions/24397299/rolling-sum-by-another-variable-in-r/24400600#24400600 – Mike.Gahan Jul 13 '14 at 23:49

3 Answers3

3

zoo has a function rollapply which can do what you need:

z <- zoo(x$Value, order.by=x$Date)

rollapply(z, width = 7, FUN = sum, partial = TRUE, align = "right")
## 2013-06-01                                                                   8
## 2013-06-02                                                                  12
## 2013-06-03                                                                  13
## 2013-06-04                                                                  20
## 2013-06-05                                                                  21
## 2013-06-06                                                                  22
## 2013-06-07                                                                  25
## 2013-06-08                                                                  25
## 2013-06-09                                                                  25
## 2013-06-10                                                                  26
## 2013-06-11                                                                  29
## 2013-06-12                                                                  32
## 2013-06-13                                                                  38
## 2013-06-14                                                                  38
## 2013-06-15                                                                  32
## 2013-06-16                                                                  29
## 2013-06-17                                                                  34
## 2013-06-18                                                                  29
## 2013-06-19                                                                  26
## 2013-06-20                                                                  23
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • But what would happen if every day did not have its own row? In other words, some days may have no data. Or what would happen if some days had two rows? Your solution works since it takes the last 7 rows but it is presuming that that represents a time range? Is there a way to do this but by calculating a time range? – Farrel Jul 14 '14 at 17:49
1

Using data.table

require(data.table)

#Build some sample data
data <- data.table(Date=1:20,Value=rpois(20,10))

#Build reference table
Ref <- data[,list(Compare_Value=list(I(Value)),Compare_Date=list(I(Date)))]

#Use lapply to get last seven days of value by id
data[,Roll.Val := lapply(Date, function(x) {
                  d <- as.numeric(Ref$Compare_Date[[1]] - x)
                  sum((d <= 0 & d >= -7)*Ref$Compare_Value[[1]])})]

head(data,10)

    Date Value Roll.Val
 1:    1    14       14
 2:    2     7       21
 3:    3     9       30
 4:    4     5       35
 5:    5    10       45
 6:    6    10       55
 7:    7    15       70
 8:    8    14       84
 9:    9     8       78
10:   10    12       83

Here is another solution if anyone is interested:

library("devtools")
install_github("boRingTrees","mgahan")
require(boRingTrees)
rollingByCalcs(data,dates="Date",target="Value",stat=sum,lower=0,upper=7)
Mike.Gahan
  • 4,565
  • 23
  • 39
  • another way, seems to be much faster, described in http://stackoverflow.com/a/27983553/2490497 – jangorecki Jan 23 '15 at 13:57
  • Nice work! But another component to this is memory use. Not sure how those compare. For big datasets, this is a critical point. I think this is promising. – Mike.Gahan Jan 23 '15 at 14:00
0

Here is one way of doing it

> input <- read.table(text = "Date        Value   
+ 6/01/2013   8   
+ 6/02/2013   4   
+ 6/03/2013   1   
+ 6/04/2013   7   
+ 6/05/2013   1   
+ 6/06/2013   1   
+ 6/07/2013   3   
+ 6/08/2013   8   
+ 6/09/2013   4   
+ 6/10/2013   2   
+ 6/11/2013   10  
+ 6/12/2013   4   
+ 6/13/2013   7   
+ 6/14/2013   3   
+ 6/15/2013   2   
+ 6/16/2013   1   
+ 6/17/2013   7   
+ 6/18/2013   5   
+ 6/19/2013   1   
+ 6/20/2013   4   ", as.is = TRUE, header = TRUE)
> input$Date <- as.Date(input$Date, format = "%m/%d/%Y")  # convert Date
> 
> # create a sequence that goes a week back from the current data
> x <- data.frame(Date = seq(min(input$Date) - 6, max(input$Date), by = '1 day'))
> 
> # merge
> merged <- merge(input, x, all = TRUE)
> 
> # replace NAs with zero
> merged$Value[is.na(merged$Value)] <- 0L
> 
> # use 'filter' for the running sum and delete first 6
> input$Sum <- filter(merged$Value, rep(1, 7), sides = 1)[-(1:6)]
> input
         Date Value Sum
1  2013-06-01     8   8
2  2013-06-02     4  12
3  2013-06-03     1  13
4  2013-06-04     7  20
5  2013-06-05     1  21
6  2013-06-06     1  22
7  2013-06-07     3  25
8  2013-06-08     8  25
9  2013-06-09     4  25
10 2013-06-10     2  26
11 2013-06-11    10  29
12 2013-06-12     4  32
13 2013-06-13     7  38
14 2013-06-14     3  38
15 2013-06-15     2  32
16 2013-06-16     1  29
17 2013-06-17     7  34
18 2013-06-18     5  29
19 2013-06-19     1  26
20 2013-06-20     4  23
>