1

I want to calculate the relationship length per customer ID, however the relationshiplength should be the same for every purchase using the ID. This length is either defection date - first date (if defection is 1) OR 2202-02-11 - first date (if defection = 0). Now it generates for every entry seperately, instead of the ID.

## Relationship length
dataset$relationship_length <- if_else(dataset$defected == 1, as.numeric(dataset$defection_date - dataset$first_donation_date), 
                                       as.numeric(as.Date("2022-02-11") - dataset$first_donation_date))

Below one can see the current dataset and the desired outcome.

ID first_donation_date date_of_donation defected defection_date relationship_length
1  2014-01-13          2014-01-13       0        NA             2951
1  2014-01-13          2014-04-14       0        NA             2951
1  2014-01-13          2014-08-13       0        NA             2951
1  2014-01-13          2014-09-12       0        NA             2951
1  2014-01-13          2014-11-12       0        NA             2951
1  2014-01-13          2015-02-13       0        NA             2951
1  2014-01-13          2017-02-14       1        2017-02-14     1128
1  2014-01-13          2018-12-13       1        2018-12-13     1795
2  2013-12-02          2013-12-02       0        NA             2993
2  2013-12-02          2014-05-02       0        NA             2993

The desired outcome is that all relationship_length for ID 1 are 1128 (the first defected=1). And for ID 2 it is 2993, as the customer never defected.

I hope this explains my question and i look forward to the answers

Estelle
  • 19
  • 3
  • This is a "calculate/summarize by group" operation, see https://stackoverflow.com/q/11562656/3358272, https://stackoverflow.com/q/1660124/3358272 for other ways to approach it (including very similar to jpsmith's answer). – r2evans Apr 17 '23 at 15:59

1 Answers1

0

There may be more elegant solutions, but here is one dplyr approach to apply duration to all rows in the group based on the first instance of defected == 1:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(relationship_length = ifelse(any(defected == 1), 
                               (defection_date[defected == 1] - first_donation_date)[defected == 1][1], 
                               as.Date("2022-02-11") - first_donation_date))

# or with newer `dplyr` versions using `.by`:

df %>%
  mutate(relationship_length = ifelse(any(defected == 1), 
                               (defection_date[defected == 1] - first_donation_date)[defected == 1][1], 
                               as.Date("2022-02-11") - first_donation_date),
         .by = ID)

Output:

      ID first_donation_date date_of_donation defected defection_date relationship_length
   <int> <date>              <date>              <int> <date>                       <dbl>
 1     1 2014-01-13          2014-01-13              0 NA                            1128
 2     1 2014-01-13          2014-04-14              0 NA                            1128
 3     1 2014-01-13          2014-08-13              0 NA                            1128
 4     1 2014-01-13          2014-09-12              0 NA                            1128
 5     1 2014-01-13          2014-11-12              0 NA                            1128
 6     1 2014-01-13          2015-02-13              0 NA                            1128
 7     1 2014-01-13          2017-02-14              1 2017-02-14                    1128
 8     1 2014-01-13          2018-12-13              1 2018-12-13                    1128
 9     2 2013-12-02          2013-12-02              0 NA                            2993
10     2 2013-12-02          2014-05-02              0 NA                            2993

Data

df <- read.table(text = "ID first_donation_date date_of_donation defected defection_date relationship_length
1  2014-01-13          2014-01-13       0        NA             2951
1  2014-01-13          2014-04-14       0        NA             2951
1  2014-01-13          2014-08-13       0        NA             2951
1  2014-01-13          2014-09-12       0        NA             2951
1  2014-01-13          2014-11-12       0        NA             2951
1  2014-01-13          2015-02-13       0        NA             2951
1  2014-01-13          2017-02-14       1        2017-02-14     1128
1  2014-01-13          2018-12-13       1        2018-12-13     1795
2  2013-12-02          2013-12-02       0        NA             2993
2  2013-12-02          2014-05-02       0        NA             2993", h = TRUE)

df[c(2,3,5)] <- lapply(df[c(2,3,5)], as.Date)
df <- df[-6]
jpsmith
  • 11,023
  • 5
  • 15
  • 36