I'm trying to count the number of duplicates of each unique string value in the column z by two other columns (x,y) in a data.table (using the data.table package or something equivalently fast, I have millions of actual rows to run this on):
I have data like this:
dt <- data.table(x=c("aa","aa","aa","bb","cc","cc","cc","cc","cc","cc"), y=c(2,2,1,1,1,1,2,2,2,3),z=c("d","d","a","d","a","a","e","e","b", "a"))
x y z
1: aa 2 d
2: aa 2 d
3: aa 1 a
4: bb 1 d
5: cc 1 a
6: cc 1 a
7: cc 2 e
8: cc 2 e
9: cc 2 b
10: cc 3 a
I'd like to have it like this:
dt.desired <- data.table(x=c("aa","aa", "bb","cc", "cc","cc", "cc"), y=c(1,2,1,1,2,2,3), z=c("a","d","d","a","b","e","a"), n=c(1,2,1,2,1,2,1))
x y z n
1: aa 1 a 1
2: aa 2 d 2
3: bb 1 d 1
4: cc 1 a 2
5: cc 2 b 1
6: cc 2 e 2
7: cc 3 a 1