1

I want to find the number(count) of unique males, females and neutrals in my dataset.

    # create a reproducible dataframe
    gender <- c("Male", "Male", "Female", "Male","Female", "Female","Neutral","Neutral", "Neutral")
    name <- c("Alex",  "Andrew", "Amelie","Alex","Amelie", "Amelie", "Amanda", "Amber", "Alessia")

    df <- cbind(gender, name)
    df <- as.data.frame(df)

Here is what I tried but it isn't what I want:

    by_gender <- df %>% 
      group_by(gender, name) %>% 
      count(gender)

I want to write a line of code that tells me that there are 2 unique "males", 1 unique "female" and 3 unique "Neutrals" in my dataset.

Essan Rago
  • 77
  • 1
  • 5

4 Answers4

1

You can use the function n_distinct

df %>% 
  group_by(gender) %>% 
  summarise(n = n_distinct(name))

  gender      n
  <chr>   <int>
1 Female      1
2 Male        2
3 Neutral     3
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
1

You could also apply using both count and distinct following the approach you've started:

library(dplyr) 

df %>%
  distinct(gender, name) %>%
  count(gender)

   gender n
1  Female 1
2    Male 2
3 Neutral 3
M Daaboul
  • 214
  • 2
  • 4
0

Another option with data.table package:

library(data.table) 
setDT(df)[, .(count = uniqueN(name)), by = gender]

Console Output:

#    gender count
#1:    Male     2
#2:  Female     1
#3: Neutral     3
AlSub
  • 1,384
  • 1
  • 14
  • 33
0

Without any dependencies, simply:

table( unique(df)$gender )

# Female    Male Neutral
#      1       2       3
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29