dplyr unique occurrence count on columns

Question

I want to get the number of unique values from one column grouped by another column using dplyr. Preferable function friendly, that is i can put this in a function and it will work easily.

So for example for the following data frame.

test = data.frame(one=rep(letters[1:5],each=2), two=c(rep("c", 3), rep("d", 2), rep("e", 4), "f") )

   one two
1    a   c
2    a   c
3    b   c
4    b   d
5    c   d
6    c   e
7    d   e
8    d   e
9    e   e
10   e   f

I would want something like the number of unique values column two gives column one.

Desired output:

From column one, a has 1 unique value "c" only, b has 2 unique value "c" and "d", c has 2 unique values "d" and "e", d has 1 unique value "e".

I managed to get something working by group_by() twice and summarize(), is there a more simple way i could use?

Hope this is understandable.

Thanks

score 4 · Answer 1 · answered Aug 29 '17 at 08:30

4

We can group by 'one' and get the number of unique elements with n_distinct

library(dplyr)
test %>% 
    group_by(one) %>%
    summarise(n = n_distinct(two))

answered Aug 29 '17 at 08:30

akrun

874,273
37
540
662

dplyr unique occurrence count on columns

1 Answers1