Count number of occurences for each unique value

Question

Let's say I have:

v = rep(c(1,2, 2, 2), 25)

Now, I want to count the number of times each unique value appears. unique(v) returns what the unique values are, but not how many they are.

> unique(v)
[1] 1 2

I want something that gives me

length(v[v==1])
[1] 25
length(v[v==2])
[1] 75

but as a more general one-liner :) Something close (but not quite) like this:

#<doesn't work right> length(v[v==unique(v)])

score 222 · Accepted Answer · edited Sep 26 '15 at 00:37

222

Perhaps table is what you are after?

dummyData = rep(c(1,2, 2, 2), 25)

table(dummyData)
# dummyData
#  1  2 
# 25 75

## or another presentation of the same data
as.data.frame(table(dummyData))
#    dummyData Freq
#  1         1   25
#  2         2   75

edited Sep 26 '15 at 00:37

Gregor Thomas

136,190
20
167
294

answered Nov 18 '10 at 13:23

Chase

67,710
18
144
161

7

Ah, yes, I can use this, with some slight modification: t(as.data.frame(table(v))[,2]) is exactly what I need, thank you – gakera Nov 18 '10 at 13:30
1

I used to do this awkwardly with `hist`. `table` seems quite a bit slower than `hist`. I wonder why. Can anyone confirm? – Museful Aug 22 '13 at 23:03
2

Chase, any chance to order by frequency? I have the exact same problem, but my table has roughly 20000 entries and I'd like to know how frequent the most common entries are. – Torvon Dec 01 '14 at 16:25
5

@Torvon - sure, just use `order()` on the results. i.e. `x <- as.data.frame(table(dummyData)); x[order(x$Freq, decreasing = TRUE), ]` – Chase Dec 02 '14 at 20:22
This method is not good, it is only fit for very few data with a lot of repeated, it will not fit a lot of continous data with few duplicated records. – Deep North Oct 10 '17 at 11:31
To count the number of levels you may also use `lapply(DF, function(x) length(table(x)))` – Peter Feb 20 '19 at 12:40

score 42 · Answer 2 · answered Sep 07 '15 at 19:08

42

If you have multiple factors (= a multi-dimensional data frame), you can use the dplyr package to count unique values in each combination of factors:

library("dplyr")
data %>% group_by(factor1, factor2) %>% summarize(count=n())

It uses the pipe operator %>% to chain method calls on the data frame data.

answered Sep 07 '15 at 19:08

antoine

2,036
25
17

2

Alternatively, and a bit shorter: `data %>% count(factor1, factor2)` – David Sep 25 '20 at 11:44

score 29 · Answer 3 · answered Sep 12 '14 at 20:09

29

It is a one-line approach by using aggregate.

> aggregate(data.frame(count = v), list(value = v), length)

  value count
1     1    25
2     2    75

answered Sep 12 '14 at 20:09

SeaSprite

564
8
12

One-liner indeed instead of using unique() + something else. Wonderful! – Martin Mar 05 '21 at 09:07
NB: This doesn't include the NA values – dsg38 Feb 09 '22 at 11:55
aggregate is underappreciated! – vonjd May 12 '22 at 08:57

score 17 · Answer 4 · edited Sep 22 '20 at 15:14

17

length(unique(df$col)) is the most simple way I can see.

edited Sep 22 '20 at 15:14

radek

7,240
8
58
83

answered Jul 21 '20 at 19:25

Jeff Henderson

643
6
10

2

R has probably evolved a lot in the last 10 years, since I asked this question. – gakera Jul 22 '20 at 10:51

score 13 · Answer 5 · edited May 23 '17 at 12:18

table() function is a good way to go, as Chase suggested. If you are analyzing a large dataset, an alternative way is to use .N function in datatable package.

Make sure you installed the data table package by

install.packages("data.table")

Code:

# Import the data.table package
library(data.table)

# Generate a data table object, which draws a number 10^7 times  
# from 1 to 10 with replacement
DT<-data.table(x=sample(1:10,1E7,TRUE))

# Count Frequency of each factor level
DT[,.N,by=x]

score 7 · Answer 6 · edited May 23 '17 at 12:18

To get an un-dimensioned integer vector that contains the count of unique values, use c().

dummyData = rep(c(1, 2, 2, 2), 25) # Chase's reproducible data
c(table(dummyData)) # get un-dimensioned integer vector
 1  2 
25 75

str(c(table(dummyData)) ) # confirm structure
 Named int [1:2] 25 75
 - attr(*, "names")= chr [1:2] "1" "2"

This may be useful if you need to feed the counts of unique values into another function, and is shorter and more idiomatic than the t(as.data.frame(table(dummyData))[,2] posted in a comment to Chase's answer. Thanks to Ricardo Saporta who pointed this out to me here.

score 7 · Answer 7 · answered Jul 04 '16 at 00:17

7

This works for me. Take your vector v

length(summary(as.factor(v),maxsum=50000))

Comment: set maxsum to be large enough to capture the number of unique values

or with the magrittr package

v %>% as.factor %>% summary(maxsum=50000) %>% length

answered Jul 04 '16 at 00:17

Anthony Ebert

675
14
25

score 6 · Answer 8 · answered Sep 17 '17 at 02:06

6

Also making the values categorical and calling summary() would work.

> v = rep(as.factor(c(1,2, 2, 2)), 25)
> summary(v)
 1  2 
25 75

answered Sep 17 '17 at 02:06

sedeh

7,083
6
48
65

score 5 · Answer 9 · answered May 28 '18 at 08:56

5

You can try also a tidyverse

library(tidyverse) 
dummyData %>% 
    as.tibble() %>% 
    count(value)
# A tibble: 2 x 2
  value     n
  <dbl> <int>
1     1    25
2     2    75

answered May 28 '18 at 08:56

Roman

17,008
3
36
49

Lionel Henry · Answer 10 · 2013-09-28T10:58:55.577

4

If you need to have the number of unique values as an additional column in the data frame containing your values (a column which may represent sample size for example), plyr provides a neat way:

data_frame <- data.frame(v = rep(c(1,2, 2, 2), 25))

library("plyr")
data_frame <- ddply(data_frame, .(v), transform, n = length(v))

edited Sep 28 '13 at 10:58

answered May 08 '13 at 14:38

Lionel Henry

6,652
27
33

3

or `ddply(data_frame, .(v), count)`. Also worth making it explicit that you need a `library("plyr")` call to make `ddply` work. – Brian Diggs May 08 '13 at 21:45
Seems strange to use `transform` instead of `mutate` when using `plyr`. – Gregor Thomas Sep 26 '15 at 00:38

score 1 · Answer 11 · answered Aug 07 '20 at 07:06

I know there are many other answers, but here is another way to do it using the sort and rle functions. The function rle stands for Run Length Encoding. It can be used for counts of runs of numbers (see the R man docs on rle), but can also be applied here.

test.data = rep(c(1, 2, 2, 2), 25)
rle(sort(test.data))
## Run Length Encoding
##   lengths: int [1:2] 25 75
##   values : num [1:2] 1 2

If you capture the result, you can access the lengths and values as follows:

## rle returns a list with two items.
result.counts <- rle(sort(test.data))
result.counts$lengths
## [1] 25 75
result.counts$values
## [1] 1 2

score 1 · Answer 12 · answered Jul 19 '22 at 06:00

You can also try dplyr::count

df <- tibble(x=c('a','b','b','c','c','d'), y=1:6)

dplyr::count(df, x, sort = TRUE)

# A tibble: 4 x 2
  x         n
  <chr> <int>
1 b         2
2 c         2
3 a         1
4 d         1

user2771312 · Answer 13 · 2013-09-17T06:17:59.170

If you want to run unique on a data.frame (e.g., train.data), and also get the counts (which can be used as the weight in classifiers), you can do the following:

unique.count = function(train.data, all.numeric=FALSE) {                                                                                                                                                                                                 
  # first convert each row in the data.frame to a string                                                                                                                                                                              
  train.data.str = apply(train.data, 1, function(x) paste(x, collapse=','))                                                                                                                                                           
  # use table to index and count the strings                                                                                                                                                                                          
  train.data.str.t = table(train.data.str)                                                                                                                                                                                            
  # get the unique data string from the row.names                                                                                                                                                                                     
  train.data.str.uniq = row.names(train.data.str.t)                                                                                                                                                                                   
  weight = as.numeric(train.data.str.t)                                                                                                                                                                                               
  # convert the unique data string to data.frame
  if (all.numeric) {
    train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
      function(x) as.numeric(unlist(strsplit(x, split=","))))))                                                                                                    
  } else {
    train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1, 
      function(x) unlist(strsplit(x, split=",")))))                                                                                                    
  }
  names(train.data.uniq) = names(train.data)                                                                                                                                                                                          
  list(data=train.data.uniq, weight=weight)                                                                                                                                                                                           
}

score -2 · Answer 14 · answered May 22 '13 at 07:49

count_unique_words <-function(wlist) {
ucountlist = list()
unamelist = c()
for (i in wlist)
{
if (is.element(i, unamelist))
    ucountlist[[i]] <- ucountlist[[i]] +1
else
    {
    listlen <- length(ucountlist)
    ucountlist[[i]] <- 1
    unamelist <- c(unamelist, i)
    }
}
ucountlist
}

expt_counts <- count_unique_words(population)
for(i in names(expt_counts))
    cat(i, expt_counts[[i]], "\n")

Count number of occurences for each unique value

14 Answers14

Linked

Related