I have data listing start and end dates for different products for a big number of users. The intervals for different products can overlap or have time gaps between purchases:
user_id start_date end_date product
12 31/10/2010 31/10/2011 A
12 18/12/2010 18/12/2011 A
12 31/10/2011 28/04/2014 B
12 18/12/2011 18/12/2014 A
12 27/03/2014 27/03/2015 A
12 18/12/2014 18/12/2016 B
12 27/03/2015 27/03/2016 B
12 18/12/2016 18/12/2017 D
33 01/07/1992 01/07/2016 A
33 20/08/1993 16/08/2016 B
33 28/10/1999 15/11/2012 A
33 31/01/2006 28/02/2006 B
33 26/08/2016 26/01/2017 C
I would like to obtain the number of days of overlaps of all potential combinations of products for each patient.
user_id A_B A_C A_D B_C B_D C_D
12 20 days 0 days 10 days 0 days 0 days 0 days
33 10 days 0 days 0 days 0 days 20 days 20 days
Is there a quick and elegant way to code it, hopefully in dplyr?
Thank you for your help!
Code:
library(lubridate)
library(Hmisc)
library(dplyr)
user_id <- c(rep(12, 8), rep(33, 5))
start_date <- dmy(Cs(31/10/2010, 18/12/2010, 31/10/2011, 18/12/2011, 27/03/2014, 18/12/2014, 27/03/2015, 18/12/2016, 01/07/1992, 20/08/1993, 28/10/1999, 31/01/2006, 26/08/2016))
end_date <- dmy(Cs(31/10/2011, 18/12/2011, 28/04/2014, 18/12/2014, 27/03/2015, 18/12/2016, 27/03/2016, 18/12/2017,
01/07/2016, 16/08/2016, 15/11/2012, 28/02/2006, 26/01/2017))
product <- c("A", "A","B","A","A","B","B","D","A","B","A","B", "C")
data <- data.frame(user_id, start_date, end_date, product )