1

This is what my data looks like. The rightmost column is my Desired Column.

Name  EventType  EventDate  SalesAmount RunningTotal Runningtotal(prior365Days)
John    Email      1/1/2014      0          0            0
John    Sale       2/1/2014     10          10           10
John    Sale       7/1/2014     20          30           30
John    Sale       4/1/2015     30          60           50 
John    Webinar    5/1/2015      0          60           50
Tom     Email      1/1/2014      0          0            0
Tom     Sale       2/1/2014     15          15           15
Tom     Sale       7/1/2014     10          25           25
Tom     Sale       4/1/2015     25          50           35 
Tom     Webinar    5/1/2015      0          50           35

I am just trying to get the running total of SalesAmount for each name in the last 365 days window. For the general "RunningTotal" column I used:

df<- df%>%
   group_by (Name)%>%
    mutate(RunningTotal = cumsum(SalesAmount))

But I dont know how to get the running total only in the last 365 day window.Kindly help. Your help is sincerely appreciated!

gibbz00
  • 1,947
  • 1
  • 19
  • 31
  • This dplyr vignette looks like a good place to start http://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html Neither dplyr nor data.table offers special tools for efficient rolling window calculations. (E.g., 2012 post by data.table author: http://stackoverflow.com/a/12157723/1191259 ) – Frank May 29 '15 at 18:26
  • 1
    @Frank Thank you! It was very helpful. – gibbz00 Jun 01 '15 at 21:35

1 Answers1

1
df$EventDate <- as.Date(df$EventDate, format="%d/%m/%Y")
df <- df %>%
   group_by (Name) %>%
   arrange(EventDate) %>% 
   mutate(day = EventDate - EventDate[1])

f <- Vectorize(function(i)
    sum(df[df$Name[i] == df$Name & df$day[i] - df$day >= 0 & 
             df$day[i] - df$day <= 365, "SalesAmount"]), vec="i")
df$RunningTotal365 <- f(1:nrow(df))
Rorschach
  • 31,301
  • 5
  • 78
  • 129
  • Thanks a lot. It works perfectly although I have no idea whats going on inside that function f. Is the "i" symbol comparing each row against the previous row? – gibbz00 May 29 '15 at 19:25
  • @gibbz00 the `i` is just like the index in a `for` loop. `Vectorize` just allows you to pass a vector to the function `f` so you don't need an explicit loop. – Rorschach May 29 '15 at 19:28
  • @frank Thanks a lot. However df$RunningTotal365 <- f(1:nrow(df)) is taking a long time as I have 1.5 million rows. Can I use any "apply" family of functions here to speed things up? Kindly let me know. – gibbz00 Jun 01 '15 at 21:30
  • @gibbz00 I think you accidentally @ pinged me, though this isn't my answer. You don't need to @ ping 6pool, since they get a notification for all comments on their post. Anyway, you're probably looking for `rollapply` from the zoo package if you need efficiency. I don't have experience with it, myself. – Frank Jun 01 '15 at 21:38
  • @Frank Thanks a lot Frank for giving me a lead on this and pinging you was intentional. – gibbz00 Jun 01 '15 at 21:40