189

Let's say I have:

v = rep(c(1,2, 2, 2), 25)

Now, I want to count the number of times each unique value appears. unique(v) returns what the unique values are, but not how many they are.

> unique(v)
[1] 1 2

I want something that gives me

length(v[v==1])
[1] 25
length(v[v==2])
[1] 75

but as a more general one-liner :) Something close (but not quite) like this:

#<doesn't work right> length(v[v==unique(v)])
Henrik
  • 65,555
  • 14
  • 143
  • 159
gakera
  • 3,589
  • 4
  • 30
  • 36

14 Answers14

222

Perhaps table is what you are after?

dummyData = rep(c(1,2, 2, 2), 25)

table(dummyData)
# dummyData
#  1  2 
# 25 75

## or another presentation of the same data
as.data.frame(table(dummyData))
#    dummyData Freq
#  1         1   25
#  2         2   75
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Chase
  • 67,710
  • 18
  • 144
  • 161
  • 7
    Ah, yes, I can use this, with some slight modification: t(as.data.frame(table(v))[,2]) is exactly what I need, thank you – gakera Nov 18 '10 at 13:30
  • 1
    I used to do this awkwardly with `hist`. `table` seems quite a bit slower than `hist`. I wonder why. Can anyone confirm? – Museful Aug 22 '13 at 23:03
  • 2
    Chase, any chance to order by frequency? I have the exact same problem, but my table has roughly 20000 entries and I'd like to know how frequent the most common entries are. – Torvon Dec 01 '14 at 16:25
  • 5
    @Torvon - sure, just use `order()` on the results. i.e. `x <- as.data.frame(table(dummyData)); x[order(x$Freq, decreasing = TRUE), ]` – Chase Dec 02 '14 at 20:22
  • This method is not good, it is only fit for very few data with a lot of repeated, it will not fit a lot of continous data with few duplicated records. – Deep North Oct 10 '17 at 11:31
  • To count the number of levels you may also use `lapply(DF, function(x) length(table(x)))` – Peter Feb 20 '19 at 12:40
42

If you have multiple factors (= a multi-dimensional data frame), you can use the dplyr package to count unique values in each combination of factors:

library("dplyr")
data %>% group_by(factor1, factor2) %>% summarize(count=n())

It uses the pipe operator %>% to chain method calls on the data frame data.

antoine
  • 2,036
  • 25
  • 17
29

It is a one-line approach by using aggregate.

> aggregate(data.frame(count = v), list(value = v), length)

  value count
1     1    25
2     2    75
SeaSprite
  • 564
  • 8
  • 12
17

length(unique(df$col)) is the most simple way I can see.

radek
  • 7,240
  • 8
  • 58
  • 83
Jeff Henderson
  • 643
  • 6
  • 10
13

table() function is a good way to go, as Chase suggested. If you are analyzing a large dataset, an alternative way is to use .N function in datatable package.

Make sure you installed the data table package by

install.packages("data.table")

Code:

# Import the data.table package
library(data.table)

# Generate a data table object, which draws a number 10^7 times  
# from 1 to 10 with replacement
DT<-data.table(x=sample(1:10,1E7,TRUE))

# Count Frequency of each factor level
DT[,.N,by=x]
Community
  • 1
  • 1
C. Zeng
  • 635
  • 1
  • 8
  • 10
7

To get an un-dimensioned integer vector that contains the count of unique values, use c().

dummyData = rep(c(1, 2, 2, 2), 25) # Chase's reproducible data
c(table(dummyData)) # get un-dimensioned integer vector
 1  2 
25 75

str(c(table(dummyData)) ) # confirm structure
 Named int [1:2] 25 75
 - attr(*, "names")= chr [1:2] "1" "2"

This may be useful if you need to feed the counts of unique values into another function, and is shorter and more idiomatic than the t(as.data.frame(table(dummyData))[,2] posted in a comment to Chase's answer. Thanks to Ricardo Saporta who pointed this out to me here.

Community
  • 1
  • 1
Ben
  • 41,615
  • 18
  • 132
  • 227
7

This works for me. Take your vector v

length(summary(as.factor(v),maxsum=50000))

Comment: set maxsum to be large enough to capture the number of unique values

or with the magrittr package

v %>% as.factor %>% summary(maxsum=50000) %>% length

Anthony Ebert
  • 675
  • 14
  • 25
6

Also making the values categorical and calling summary() would work.

> v = rep(as.factor(c(1,2, 2, 2)), 25)
> summary(v)
 1  2 
25 75 
sedeh
  • 7,083
  • 6
  • 48
  • 65
5

You can try also a tidyverse

library(tidyverse) 
dummyData %>% 
    as.tibble() %>% 
    count(value)
# A tibble: 2 x 2
  value     n
  <dbl> <int>
1     1    25
2     2    75
Roman
  • 17,008
  • 3
  • 36
  • 49
4

If you need to have the number of unique values as an additional column in the data frame containing your values (a column which may represent sample size for example), plyr provides a neat way:

data_frame <- data.frame(v = rep(c(1,2, 2, 2), 25))

library("plyr")
data_frame <- ddply(data_frame, .(v), transform, n = length(v))
Lionel Henry
  • 6,652
  • 27
  • 33
1

I know there are many other answers, but here is another way to do it using the sort and rle functions. The function rle stands for Run Length Encoding. It can be used for counts of runs of numbers (see the R man docs on rle), but can also be applied here.

test.data = rep(c(1, 2, 2, 2), 25)
rle(sort(test.data))
## Run Length Encoding
##   lengths: int [1:2] 25 75
##   values : num [1:2] 1 2

If you capture the result, you can access the lengths and values as follows:

## rle returns a list with two items.
result.counts <- rle(sort(test.data))
result.counts$lengths
## [1] 25 75
result.counts$values
## [1] 1 2
steveb
  • 5,382
  • 2
  • 27
  • 36
1

You can also try dplyr::count

df <- tibble(x=c('a','b','b','c','c','d'), y=1:6)

dplyr::count(df, x, sort = TRUE)

# A tibble: 4 x 2
  x         n
  <chr> <int>
1 b         2
2 c         2
3 a         1
4 d         1
0

If you want to run unique on a data.frame (e.g., train.data), and also get the counts (which can be used as the weight in classifiers), you can do the following:

unique.count = function(train.data, all.numeric=FALSE) {                                                                                                                                                                                                 
  # first convert each row in the data.frame to a string                                                                                                                                                                              
  train.data.str = apply(train.data, 1, function(x) paste(x, collapse=','))                                                                                                                                                           
  # use table to index and count the strings                                                                                                                                                                                          
  train.data.str.t = table(train.data.str)                                                                                                                                                                                            
  # get the unique data string from the row.names                                                                                                                                                                                     
  train.data.str.uniq = row.names(train.data.str.t)                                                                                                                                                                                   
  weight = as.numeric(train.data.str.t)                                                                                                                                                                                               
  # convert the unique data string to data.frame
  if (all.numeric) {
    train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
      function(x) as.numeric(unlist(strsplit(x, split=","))))))                                                                                                    
  } else {
    train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
      function(x) unlist(strsplit(x, split=",")))))                                                                                                    
  }
  names(train.data.uniq) = names(train.data)                                                                                                                                                                                          
  list(data=train.data.uniq, weight=weight)                                                                                                                                                                                           
}  
-2
count_unique_words <-function(wlist) {
ucountlist = list()
unamelist = c()
for (i in wlist)
{
if (is.element(i, unamelist))
    ucountlist[[i]] <- ucountlist[[i]] +1
else
    {
    listlen <- length(ucountlist)
    ucountlist[[i]] <- 1
    unamelist <- c(unamelist, i)
    }
}
ucountlist
}

expt_counts <- count_unique_words(population)
for(i in names(expt_counts))
    cat(i, expt_counts[[i]], "\n")