Frequency based on unique date (alternative to a for loop) R

Question

I have a large sales dataset (> 1 M rows) that contains 2 columns:

Client_ID
Purchase_Date

Client ID often repeats.

For each row, I would like to calculate "How many times has the client purchased in the last 30 days."

Client_ID
Purchase_Date
Freq_Last_30_Days

Currently, I loop through each row and do a nrow() count using the following criteria:

data$Client_ID == data$Client_ID[i]
data$Purchase_Date < data$Purchase_Date[i]
data$Purchase_Date >= data$Purchase_Date[i] - 30

Given the database size, this is taking over hours to do. Is there a more efficient way of finding the frequency without using a for loop?

Also, you could check [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — akrun, Jan 09 '17 at 02:49

score 0 · Answer 1 · answered Jan 09 '17 at 04:38

You could use sapply instead of a for loop, but probably is still slow (but worth a try). dplyr's filter will be a little faster than base R's [] method as well.

For example,

dfr <- data.frame(Client_ID = sample(letters[1:15], 100, replace=TRUE), 
              Date=seq.Date(as.Date("2016-11-1"), length=100, by="1 day"))

row_fun <- function(i){
  subs <- dfr[dfr$Date > dfr$Date[i] - 30 & dfr$Date < dfr$Date[i],]
  sum(subs$Client_ID == dfr$Client_ID[i])
}
dfr$Freq_Last_30_Days <- sapply(1:nrow(dfr), row_fun)

Frequency based on unique date (alternative to a for loop) R

1 Answers1