-1

I have the following dataset:

Names   Category
Jack    1
Jack    1
Jack    1
Tom     0
Tom     0
Sara    0
Sara    0

what I am looking for is the following:

Category Number
0        2
1        1

that is, the number of unique values in column Names per each category.

I can get the number of unique values in the first column:

length(unique(df$Names))

and the total repeated number of categories in the second column:

length(which(df$Category== 1))

but this is not the result i am looking for.

smci
  • 32,567
  • 20
  • 113
  • 146
cplus
  • 1,115
  • 4
  • 22
  • 55

3 Answers3

1

Or aggregate in base R:

aggregate(Names ~ Category, data=df, FUN=function(x) length(unique(x)))
  Category Names
1        0     2
2        1     1
lmo
  • 37,904
  • 9
  • 56
  • 69
0

Using data.table

library(data.table)
setDT(df)[, .(Number =uniqueN(Names)), by = Category]
#    Category Number
#1:        1      1
#2:        0      2
akrun
  • 874,273
  • 37
  • 540
  • 662
-4

Using dplyr. You don't even need to manually get the unique Names first:

df <- data.frame(Names=c(rep('Jack',3),rep('Tom',2),rep('Sara',2)),
                 Category=c(1,1,1,0,0,0,0))
require(dplyr)

df %>% group_by(Category) %>% summarize(Number = n_distinct(Names))

  Category Number
     <dbl>  <int>
1        0      2
2        1      1

# and you can use as.data.frame(...) on that if you like

UPDATED: it was not clear OP's original wording they wanted to first group-by Category, then count number of distinct Names within each group.

smci
  • 32,567
  • 20
  • 113
  • 146