Count how often an id appears more than once on the same day in R

Question

I'm trying to count how often an id appears more than once on the same day in my data.

I have a column of dates and column of personal ids.

set.seed(42) 
n     <- 10^5 
ids   <- sample(1:5000, n, replace=T)
dates <- as.Date("1943-07-13", "%Y-%m-%d") - sample(1:9000, n, replace=T)

The solution I have come up with:

sum(table(ids, dates)>1)

The problem is (lack of) efficiency. On a bigger data set I get an error message.

Fyi, you can write `1e5` so R doesn't have to do the computation 10^5. — Frank, Jul 13 '16 at 16:47

score 2 · Accepted Answer · answered Jul 13 '16 at 16:35

We can use data.table. Create a data.table with 'ids', and 'dates', get the number of rows grouped by those columns, check whether that is greater than 1, extract that column and get the sum.

sum(data.table(ids, dates)[, .N > 1, .(ids, dates)]$V1)

Benchmarks

system.time(sum(data.table(ids, dates)[, .N > 1, .(ids, dates)]$V1))
#   user  system elapsed 
#  0.05    0.00    0.05 
system.time(sum(table(ids, dates)>1))
#   user  system elapsed 
#   0.87    0.19    1.07

Count how often an id appears more than once on the same day in R

1 Answers1

Benchmarks

Linked