I have this data basically, but larger:
I want to count a number of distinct combinations of (customer_id, account_id) - that is, distinct or unique values based on two columns, but for each start_date. I can't find the solution anywhere. The result should be another column added to my data.table that should look like this:
That is, for each start_date, it calculates number of distinct values based on both customer_id and account_id.
For example, for start_date equal to 2.2.2018, I have distinct combinations in (customer_id,account_id) being (4,22) (5,38) and (6,13), so I want count to be equal to 3 because I have 3 distinct combinations. I also need the solution to work with character values in customer_id and account_id columns.
Code to replicate the data:
customer_id <- c(1,1,1,2,3,3,4,5,5,6)
account_id <- c(11,11,11,11,55,88,22,38,38,13)
start_date <- c(rep(as.Date("2017-01-01","%Y-%m-%d"),each=6),rep(as.Date("2018-02-02","%Y-%m-%d"),each=4))
data <- data.table(customer_id,account_id,start_date)