-3

For some website data, I have a dataframe with a column of dates and a column of visitors. I wish to add some rolling aggregate columns based on the day of the week.

I'm trying to aggregate (mean, median, sum, count) by day of the week, so day x would be grouped with dayx - 7, day x - 14 ... day x - 7*n where n is the number of weeks required in the window, where the min data in the data will go back further than 7*n.

For example, if the last 5 Fridays saw visitor levels of 100, 110, 120, 130, 160 then then entry against the most recent friday for median value over 3 weeks would be 130, and 136.67 for mean over past 3 Fridays.

Sample dataset in:

structure(list(visit_date = structure(1:20, .Label = c("01-01-16", 
"01-02-16", "01-03-16", "01-04-16", "01-05-16", "01-06-16", "01-07-16", 
"01-08-16", "01-09-16", "01-10-16", "01-11-16", "01-12-16", "01-13-16", 
"01-14-16", "01-15-16", "01-16-16", "01-17-16", "01-18-16", "01-19-16", 
"01-20-16"), class = "factor"), visitors = c(114L, 158L, 153L, 
157L, 192L, 128L, 197L, 146L, 123L, 127L, 170L, 126L, 106L, 112L, 
119L, 184L, 186L, 171L, 183L, 125L)), .Names = c("visit_date", 
"visitors"), class = "data.frame", row.names = c(NA, -20L))

ideal output for sum()

newdf <- structure(list(visit_date = structure(1:20, .Label = c("01-01-16", 
"01-02-16", "01-03-16", "01-04-16", "01-05-16", "01-06-16", "01-07-16", 
"01-08-16", "01-09-16", "01-10-16", "01-11-16", "01-12-16", "01-13-16", 
"01-14-16", "01-15-16", "01-16-16", "01-17-16", "01-18-16", "01-19-16", 
"01-20-16"), class = "factor"), visitors = c(114L, 158L, 153L, 
157L, 192L, 128L, 197L, 146L, 123L, 127L, 170L, 126L, 106L, 112L, 
119L, 184L, 186L, 171L, 183L, 125L), sum_visitors = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 379L, 465L, 466L, 
498L, 501L, 359L)), .Names = c("visit_date", "visitors", "sum_visitors"
), class = "data.frame", row.names = c(NA, -20L))

I've looked at rollapply but am unsure how to roll this out by row in the data frame.

Hope this makes sense, thanks in advance.

  • 1
    Would you please look at http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and post sample data etc.? – m-dz Jun 23 '16 at 10:43
  • Thanks for the guidance - first post so it's helpful. Hope the edit is suitable for SO. – pawfounder Jun 23 '16 at 12:32

1 Answers1

0

OK, my best guess (using data.table to simplify grouping operations):

require(data.table)
require(lubridate)
require(zoo)

dt <- data.table(visit_date = c("01-01-16", "01-02-16", "01-03-16", "01-04-16", "01-05-16", "01-06-16", "01-07-16", "01-08-16", "01-09-16", "01-10-16", "01-11-16", "01-12-16", "01-13-16", "01-14-16", "01-15-16", "01-16-16", "01-17-16", "01-18-16", "01-19-16", "01-20-16"),
                 visitors = c(114L, 158L, 153L, 157L, 192L, 128L, 197L, 146L, 123L, 127L, 170L, 126L, 106L, 112L, 119L, 184L, 186L, 171L, 183L, 125L))


dt[, visit_date := mdy(visit_date)]
dt[, week_day := weekdays(visit_date)]

n_weeks <- 2

dt[, sum_visitors := rollsum(visitors, n_weeks, align = "right", fill = NA), by = week_day]
dt[, sum_visitors_V2 := rollapply(visitors, n_weeks, sum, align = "right", fill = NA), by = week_day]

Output:

#     visit_date visitors  week_day sum_visitors sum_visitors_V2
#  1: 2016-01-01      114    Friday           NA              NA
#  2: 2016-01-02      158  Saturday           NA              NA
#  3: 2016-01-03      153    Sunday           NA              NA
#  4: 2016-01-04      157    Monday           NA              NA
#  5: 2016-01-05      192   Tuesday           NA              NA
#  6: 2016-01-06      128 Wednesday           NA              NA
#  7: 2016-01-07      197  Thursday           NA              NA
#  8: 2016-01-08      146    Friday          260             260
#  9: 2016-01-09      123  Saturday          281             281
# 10: 2016-01-10      127    Sunday          280             280
# 11: 2016-01-11      170    Monday          327             327
# 12: 2016-01-12      126   Tuesday          318             318
# 13: 2016-01-13      106 Wednesday          234             234
# 14: 2016-01-14      112  Thursday          309             309
# 15: 2016-01-15      119    Friday          265             265
# 16: 2016-01-16      184  Saturday          307             307
# 17: 2016-01-17      186    Sunday          313             313
# 18: 2016-01-18      171    Monday          341             341
# 19: 2016-01-19      183   Tuesday          309             309
# 20: 2016-01-20      125 Wednesday          231             231
m-dz
  • 2,342
  • 17
  • 29