3

I'm stuck trying to do some counting on a data frame. The gist is to group by one variable and then break further by groups based on a second variable. From here I want to count the size if the subgroups for each group. The sample code is this:

set.seed(123456)
df <- data.frame(User = c(rep("A", 5), rep("B", 4), rep("C", 6)), 
                 Rank = c(rpois(5,1), rpois(4,2), rpois(6,3)))

#This results in an error
df %>% group_by(User) %>% group_by(Rank) %>% summarize(Res = n_groups())

So what I want is 'User A' to have 3, 'User B' to have 4, and 'User C' to have 5. In other words the data frame df would end up looking like:

   User Rank Result
1     A    2      3
2     A    2      3
3     A    1      3
4     A    0      3
5     A    0      3
6     B    1      4
7     B    2      4
8     B    0      4
9     B    6      4
10    C    1      5
11    C    4      5
12    C    3      5
13    C    5      5
14    C    5      5
15    C    8      5

I'm still learning dplyr, so I'm unsure how I should do it. How can this be achieved? Non-dplyr answers are also very welcome. Thanks in advance!

J. Paul
  • 391
  • 1
  • 4
  • 14

2 Answers2

6

Try this:

df %>% group_by(User) %>% mutate(Result=length(unique(Rank)))

Or (see comment below):

df %>% group_by(User) %>% mutate(Result=n_distinct(Rank))
thc
  • 9,527
  • 1
  • 24
  • 39
1

A base R option would be using ave

df$Result <- with(df, ave(Rank, User, FUN = function(x) length(unique(x))))
df$Result
#[1] 3 3 3 3 3 4 4 4 4 5 5 5 5 5 5

and a data.table option is

library(data.table)
setDT(df)[, Result := uniqueN(Rank), by = User]
akrun
  • 874,273
  • 37
  • 540
  • 662