0

I want to get the number of unique values from one column grouped by another column using dplyr. Preferable function friendly, that is i can put this in a function and it will work easily.

So for example for the following data frame.

test = data.frame(one=rep(letters[1:5],each=2), two=c(rep("c", 3), rep("d", 2), rep("e", 4), "f") )

   one two
1    a   c
2    a   c
3    b   c
4    b   d
5    c   d
6    c   e
7    d   e
8    d   e
9    e   e
10   e   f

I would want something like the number of unique values column two gives column one.

Desired output:

  one n
1   a 1
2   b 2
3   c 2
4   d 1
5   e 2

From column one, a has 1 unique value "c" only, b has 2 unique value "c" and "d", c has 2 unique values "d" and "e", d has 1 unique value "e".

I managed to get something working by group_by() twice and summarize(), is there a more simple way i could use?

Hope this is understandable.

Thanks

chrk623
  • 76
  • 1
  • 5

1 Answers1

4

We can group by 'one' and get the number of unique elements with n_distinct

library(dplyr)
test %>% 
    group_by(one) %>%
    summarise(n = n_distinct(two))
akrun
  • 874,273
  • 37
  • 540
  • 662