0

I'm trying to count the number of duplicates of each unique string value in the column z by two other columns (x,y) in a data.table (using the data.table package or something equivalently fast, I have millions of actual rows to run this on):

I have data like this:

dt <- data.table(x=c("aa","aa","aa","bb","cc","cc","cc","cc","cc","cc"), y=c(2,2,1,1,1,1,2,2,2,3),z=c("d","d","a","d","a","a","e","e","b", "a")) 

     x y z
 1: aa 2 d
 2: aa 2 d
 3: aa 1 a
 4: bb 1 d
 5: cc 1 a
 6: cc 1 a
 7: cc 2 e
 8: cc 2 e
 9: cc 2 b
10: cc 3 a

I'd like to have it like this:

dt.desired <- data.table(x=c("aa","aa", "bb","cc", "cc","cc", "cc"), y=c(1,2,1,1,2,2,3), z=c("a","d","d","a","b","e","a"), n=c(1,2,1,2,1,2,1))


    x y z n
1: aa 1 a 1
2: aa 2 d 2
3: bb 1 d 1
4: cc 1 a 2
5: cc 2 b 1
6: cc 2 e 2
7: cc 3 a 1
Neal Barsch
  • 2,810
  • 2
  • 13
  • 39

1 Answers1

-1

You can do this with dplyr and magrittr in tidyverse:

library(data.table)
library(tidyverse)

> dt %>% count(x,y,z)
# A tibble: 7 x 4
  x         y z         n
  <chr> <dbl> <chr> <int>
1 aa       1. a         1
2 aa       2. d         2
3 bb       1. d         1
4 cc       1. a         2
5 cc       2. b         1
6 cc       2. e         2
7 cc       3. a         1

If you want to create a new data frame, just assign to a variable like

z <- dt %>% count(x,y,z)
mysteRious
  • 4,102
  • 2
  • 16
  • 36