1

I have a claim file with 2 columns: "Customer Id", "Date of the declaration".

I would like to see (and count) if a Customer was involved in more than one accident within X period of time (let's say one year).

My data looks like this:

Customer_Id     Declaration_date   
001             12/10/2017
001             12/10/2017
002             24/10/2017
003             25/10/2017
004             25/10/2017
001             05/12/2017
006             07/12/2017

Here it is:

D <- data.frame(Customer_Id = c(001, 001, 002, 003, 004, 001, 006),
            Declaration_date = as.Date(c("12/10/2017", "12/10/2017", "24/10/2017", "25/10/2017", "25/10/2017", "05/12/2017", "07/12/2017"), format = "%d/%m/%Y"))

Here, we can see that the Customer "001" had two claims on the 12/10, but also had one claim on the 05/12. Thus, what I would like to have is a third column counting the number of different claims, based on the dates, the customer had since 01/01/2016 for instance. The output should look like this:

Customer_Id     Declaration_date     Number of claims 
001             12/10/2017           2
001             12/10/2017           2
002             24/10/2017           1
003             25/10/2017           1
004             25/10/2017           1
001             05/12/2017           2
006             07/12/2017           1

Please note that having a customer id more than once for the same date should not sum the "number of claims". In my example, the Customer 001 have "2" claims because he had one (or more) claim on the 12/10, but also on the 05/12.

Any help would be very much appreciated.

Thanks a lot,

Wael
  • 1,640
  • 1
  • 9
  • 20
AllanLC
  • 167
  • 2
  • 11

1 Answers1

3

We can use ave from base R to create a column by getting lengthofunique` elements of 'Declaration_date'

with(D, ave(as.numeric(Declaration_date), Customer_Id, FUN = function(x) length(unique(x))))

Or with dplyr

library(dplyr)
D %>%
  group_by(Customer_Id) %>%
  mutate(Number_of_claims = n_distinct(Declaration_date))

Or using data.table

library(data.table)
setDT(D)[,  Number_of_claims := uniqueN(Declaration_date), Customer_Id]
akrun
  • 874,273
  • 37
  • 540
  • 662