0

I have a table (lets call it TBL1), It has a Start and a End column on certain dates.

Table 2 (TBL2) is a created from pulling a years worth of data of sessions from Google Analytics.

Only variables in TBL2 is Dates(daily) and sessions(corresponding to each day).

Variables in TBL1 are StartDate, EndDate, sessions(empty)

I want to get the sum of the sessions from TBL2 between TBL1$StartDate & TBL1$EndDate and place the sum of sessions into TBL1$sessions.

Is this possible in R?

Example Code Below: Please note that TBL2 data is actually coming from Google Analytics. I also have over 100 start and end dates in my real problem spanning a year.


StartDate <-c("2017-01-01", "2017-01-09","2017-01-18", "2017-01-07")

EndDate <- c("2017-01-05", "2017-01-11", "2017-01-25", "2017-01-28" )

Sessions <- c(" ", " ", " ", " ")

TBL1 <- data.frame(StartDate, EndDate, Sessions)
as.Date(TBL1$StartDate)
as.Date(TBL1$EndDate)


StartDate   EndDate      Sessions
2017-01-01  2017-01-05          
2017-01-09  2017-01-11          
2017-01-18  2017-01-25          
2017-01-07  2017-01-28

Date <- c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06","2017-01-07","2017-01-08","2017-01-09","2017-01-10","2017-01-11","2017-01-12","2017-01-13","2017-01-14","2017-01-15","2017-01-16","2017-01-17","2017-01-18","2017-01-19","2017-01-20","2017-01-21","2017-01-22","2017-01-23","2017-01-24","2017-01-25","2017-01-26","2017-01-27","2017-01-28","2017-01-29","2017-01-30","2017-01-31")

sessions <- sample(200:5000,31)

TBL2 <- data.frame(Date, sessions)
as.Date(TBL2$Date)


Date      sessions
2017-01-01  1920            
2017-01-02  1276            
2017-01-03  1604            
2017-01-04  4283            
2017-01-05  4170            
2017-01-06  2870            
2017-01-07  2255            
2017-01-08  3660            
2017-01-09  290         
2017-01-10  4024
2017-01-11  1433            
2017-01-12  2168            
2017-01-13  2096            
2017-01-14  4649            
2017-01-15  836         
2017-01-16  3354            
2017-01-17  2366            
2017-01-18  1450            
2017-01-19  2067            
2017-01-20  4172            
2017-01-21  3081            
2017-01-22  3060            
2017-01-23  417         
2017-01-24  3422            
2017-01-25  2905            
2017-01-26  427         
2017-01-27  2163            
2017-01-28  2221            
2017-01-29  2350            
2017-01-30  3529            
2017-01-31  4156                

EndOutput <- data.frame(StartDate, EndDate, Session)


StartDate    EndDate   Session
2017-01-01  2017-01-05  13253       
2017-01-09  2017-01-11  5747        
2017-01-18  2017-01-25  20574       
2017-01-07  2017-01-28  49094

1 Answers1

0

I couldn't figure out how to vectorize this correctly, but here's a quick and dirty way to do this using lubridate's %within% operator:

library(lubridate)

# Convert our character dates into lubridate date-times.
TBL1$StartDate <- ymd(TBL1$StartDate)
TBL2$StartDate <- ymd(TBL1$StartDate)
TBL2$Date      <- ymd(TBL2$Date)

# Define a helper function to sum sessions between the start and end date of row x.
sum_of_sessions <- function (x) {sum(TBL2$sessions[TBL2$Date %within% interval(TBL1$StartDate[x], TBL1$EndDate[x])])}

# Store the results in TBL$Sessions.
for (i in nrow(TBL1)) {TBL1$Sessions[i] <- sum_of_sessions(i)}
jdonland
  • 189
  • 8
  • That gave me the biggest issues as well. Thank you for taking the time out of your day to provide the code above. – ShahidAziz Nov 06 '19 at 15:01