11

I am trying to get a simple way to count the number of distinct categories in a column of a dataframe.

For example, in the iris data frame, there are 150 rows with one of the columns being species, of which there are 3 different species. I want to be able to run this bit of code and determine that there are 3 different species in that column. I do not care how many rows each of those unique entries correspond to, just how many distinct variables there are, which is mostly what I found in my research.

I was thinking something like this:

df <- iris
choices <- count(unique(iris$Species))

Does a solution as simple as this exist? I have looked at these posts, but they either examine the entire data frame rather than a single column in that data frame or provide a more complicated solution than what I am hoping for.

count number of instances in data frame

Count number of occurrences of categorical variables in data frame (R)

How to count number of unique character vectors within a subset of data

Henrik
  • 65,555
  • 14
  • 143
  • 159
User247365
  • 665
  • 2
  • 11
  • 27

7 Answers7

11

The following should do the job:

choices <- length(unique(iris$Species))
Imran Ali
  • 2,223
  • 2
  • 28
  • 41
4

If we are using dplyr, n_distinct would get the number of unique elements in each column

library(dplyr)
iris %>%
      summarise_each(funs(n_distinct))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1           35          23           43          22       3
akrun
  • 874,273
  • 37
  • 540
  • 662
3

If your need is to count the number of unique instances for each column of your data.frame, you can use sapply:

sapply(iris, function(x) length(unique(x)))
#### Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
####  35           23          43            22               3

For just one specific colum, the code suggested by @Imran Ali (in the comments) is perfectly fine.

agenis
  • 8,069
  • 5
  • 53
  • 102
3

Using data.table is easier:

require(data.table)
uniqueN(iris$Species)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
DMillan
  • 119
  • 6
0

Another way to count unique values across all columns in 'iris' :

> df <- iris

> df$Species <- as.character(df$Species)

> aggregate(values ~ ind, unique(stack(df)), length)
           ind values
1 Petal.Length     43
2  Petal.Width     22
3 Sepal.Length     35
4  Sepal.Width     23
5      Species      3
> 
Ram K
  • 1,746
  • 2
  • 14
  • 23
0

Another simple way to count with Tidyverse package:

iris %>% 
  count(Species)

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50
Buckbeak
  • 3
  • 3
  • They asked about counting the number of unique values, not how to count the number of observations in each group – camille Dec 07 '22 at 23:53
0

Dplyr version 1 introduced across, which makes this task relatively straightforward along with n_distinct():

library(dplyr)

# for a specific column
iris %>% 
  summarise(across(Species, n_distinct))
#   Species
# 1       3

# only for factors
iris %>% 
  summarise(across(where(is.factor), nlevels))
#   Species
# 1       3

# for all columns 
iris %>% 
  summarise(across(everything(), n_distinct))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           35          23           43          22       3