I have a claim file with 2 columns: "Customer Id", "Date of the declaration".
I would like to see (and count) if a Customer was involved in more than one accident within X period of time (let's say one year).
My data looks like this:
Customer_Id Declaration_date
001 12/10/2017
001 12/10/2017
002 24/10/2017
003 25/10/2017
004 25/10/2017
001 05/12/2017
006 07/12/2017
Here it is:
D <- data.frame(Customer_Id = c(001, 001, 002, 003, 004, 001, 006),
Declaration_date = as.Date(c("12/10/2017", "12/10/2017", "24/10/2017", "25/10/2017", "25/10/2017", "05/12/2017", "07/12/2017"), format = "%d/%m/%Y"))
Here, we can see that the Customer "001" had two claims on the 12/10, but also had one claim on the 05/12. Thus, what I would like to have is a third column counting the number of different claims, based on the dates, the customer had since 01/01/2016 for instance. The output should look like this:
Customer_Id Declaration_date Number of claims
001 12/10/2017 2
001 12/10/2017 2
002 24/10/2017 1
003 25/10/2017 1
004 25/10/2017 1
001 05/12/2017 2
006 07/12/2017 1
Please note that having a customer id more than once for the same date should not sum the "number of claims". In my example, the Customer 001 have "2" claims because he had one (or more) claim on the 12/10, but also on the 05/12.
Any help would be very much appreciated.
Thanks a lot,