Count number of unique levels of a variable

Question

I am trying to get a simple way to count the number of distinct categories in a column of a dataframe.

For example, in the iris data frame, there are 150 rows with one of the columns being species, of which there are 3 different species. I want to be able to run this bit of code and determine that there are 3 different species in that column. I do not care how many rows each of those unique entries correspond to, just how many distinct variables there are, which is mostly what I found in my research.

I was thinking something like this:

df <- iris
choices <- count(unique(iris$Species))

Does a solution as simple as this exist? I have looked at these posts, but they either examine the entire data frame rather than a single column in that data frame or provide a more complicated solution than what I am hoping for.

count number of instances in data frame

Count number of occurrences of categorical variables in data frame (R)

How to count number of unique character vectors within a subset of data

@ImranAli that was perfect as long as I specified `choices <- as.numeric(length(unique(iris$Species)))` If you make your comment an answer I will mark it as correct. — User247365, Jul 21 '16 at 00:42
To get count for all columns: `lengths(lapply(iris, unique))` https://stackoverflow.com/questions/22196078/r-count-unique-values-for-every-column — zx8754, May 21 '18 at 09:19

score 11 · Accepted Answer · answered Jul 21 '16 at 01:18

11

The following should do the job:

choices <- length(unique(iris$Species))

answered Jul 21 '16 at 01:18

Imran Ali

2,223
2
28
41

score 4 · Answer 2 · answered Jul 21 '16 at 03:02

If we are using dplyr, n_distinct would get the number of unique elements in each column

library(dplyr)
iris %>%
      summarise_each(funs(n_distinct))
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1           35          23           43          22       3

score 3 · Answer 3 · answered Jul 21 '16 at 00:29

If your need is to count the number of unique instances for each column of your data.frame, you can use sapply:

sapply(iris, function(x) length(unique(x)))
#### Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
####  35           23          43            22               3

For just one specific colum, the code suggested by @Imran Ali (in the comments) is perfectly fine.

score 3 · Answer 4 · edited May 21 '18 at 09:30

3

Using data.table is easier:

require(data.table)
uniqueN(iris$Species)

edited May 21 '18 at 09:30

David Arenburg

91,361
17
137
196

answered May 21 '18 at 09:17

DMillan

119
6

score 0 · Answer 5 · answered Jul 21 '16 at 01:36

Another way to count unique values across all columns in 'iris' :

> df <- iris

> df$Species <- as.character(df$Species)

> aggregate(values ~ ind, unique(stack(df)), length)
           ind values
1 Petal.Length     43
2  Petal.Width     22
3 Sepal.Length     35
4  Sepal.Width     23
5      Species      3
>

score 0 · Answer 6 · answered Jan 03 '21 at 19:59

0

Another simple way to count with Tidyverse package:

iris %>% 
  count(Species)

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

answered Jan 03 '21 at 19:59

Buckbeak

3
3

They asked about counting the number of unique values, not how to count the number of observations in each group – camille Dec 07 '22 at 23:53

score 0 · Answer 7 · answered May 12 '21 at 09:23

Dplyr version 1 introduced across, which makes this task relatively straightforward along with n_distinct():

library(dplyr)

# for a specific column
iris %>% 
  summarise(across(Species, n_distinct))
#   Species
# 1       3

# only for factors
iris %>% 
  summarise(across(where(is.factor), nlevels))
#   Species
# 1       3

# for all columns 
iris %>% 
  summarise(across(everything(), n_distinct))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           35          23           43          22       3

Count number of unique levels of a variable

7 Answers7

Linked

Related