I have a data set which contains a number of unique identifiers for each date, e.g.
df <- data.frame(date = as.Date(c("2016-01-01", "2016-01-01", "2016-01-02", "2016-01-02")),
ids = c(3, 4, 1, 3))
I'd then like to summarise this information to get the number of new unique ids that appear on the current date. For example, on January 1 there are two uniques ids (3 and 4). But on January 2, there is only one new unique id (1). So the resulting data frame should look like:
date n_new_unique_ids
2016-01-01 2
2016-01-02 1
Is this possible with dplyr
? I had a look at lag
but a fixed lag size doesn't make sense in this context. Or perhaps with another package?