0

I have 10000 id's, which all have a different number of purchase dates (with a minimum of 1). Per customer id, I want to find out if the number of days between the purchase dates exceeds 183 days (which is half a year). I only need a yes or no (binary) answer. The code I have for now is the following, however it does not work

for (id in 10000){
  for (i in length(purchase_date[id])){
    for (j in length(purchase_date[id])-1){
      if (as.numeric(purchase_date[i]-purchase_date[j])>=183){
        above_half_year <- 1
        }else{
          above_half_year <- 0 
        }
    }
  }
}

The top of the dataset is as follows

   ID  purchase_date 
1   1  2014-01-13 
2   1  2014-04-14 
3   1  2014-08-13 
4   1  2014-09-12 
5   1  2014-11-12 
6   1  2015-02-13 
7   1  2017-02-14 
8   1  2018-12-13 
9   1  2019-04-15
10  2  2016-03-01 
11  3  2016-06-13 
12  3  2016-09-20
13  3  2016-10-20 
14  3  2016-11-21 
15  3  2016-12-20 

The expected output would be

   ID  purchase_date above_half_year
1   1  2014-01-13    0
2   1  2014-04-14    0
3   1  2014-08-13    0
4   1  2014-09-12    0
5   1  2014-11-12    0
6   1  2015-02-13    0
7   1  2017-02-14    1
8   1  2018-12-13    1
9   1  2019-04-15    0
10  2  2016-03-01    0
11  3  2016-06-13    0
12  3  2016-09-20    0
13  3  2016-10-20    0
14  3  2016-11-21    0
15  3  2016-12-20    0

Thank you in advance for your help! I look forward to your suggestions

Estelle
  • 19
  • 3
  • 1
    Please provide your expected output given that sample data. – r2evans Mar 16 '23 at 17:53
  • (in the question please) – r2evans Mar 16 '23 at 17:58
  • likely something like `dat$toolong <- ave(as.numeric(dat$purchase_date), dat$ID, FUN = function(z) c(0, diff(z))) >= 183`. See https://stackoverflow.com/q/11562656/3358272 – r2evans Mar 16 '23 at 17:59
  • The link above is more about aggregation than inline calculations, but the premises (plus my use of `ave`) are still the same. Doing per-group calculations safely are an important thing to master, I think Seth's answer is a good start using `dplyr`. Either way, I suggest when you consider using multiple nested `for`-loops in R, there is often a much better (faster, efficient, safer) way to do it. Glad you found an answer! – r2evans Mar 16 '23 at 18:05

1 Answers1

0

This is an approach with dplyr

df %>%
  group_by(ID) %>%
  mutate(above_half_year = if_else(purchase_date - lag(purchase_date) > 183, 1, 0))
Seth
  • 1,659
  • 1
  • 4
  • 11