I have a large sales dataset (> 1 M rows) that contains 2 columns:
- Client_ID
- Purchase_Date
Client ID often repeats.
For each row, I would like to calculate "How many times has the client purchased in the last 30 days."
- Client_ID
- Purchase_Date
- Freq_Last_30_Days
Currently, I loop through each row and do a nrow() count using the following criteria:
data$Client_ID == data$Client_ID[i]
data$Purchase_Date < data$Purchase_Date[i]
data$Purchase_Date >= data$Purchase_Date[i] - 30
Given the database size, this is taking over hours to do. Is there a more efficient way of finding the frequency without using a for loop?