Counting the number of elements with the values of x in a vector

Question

I have a vector of numbers:

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
         453,435,324,34,456,56,567,65,34,435)

How can I have R count the number of times a value x appears in the vector?

Shane · Accepted Answer · 2009-12-17T17:32:19.980

616

You can just use table():

> a <- table(numbers)
> a
numbers
  4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
  2   1   2   2   1   1   2   1   2   1   3   1   1   1   1

Then you can subset it:

> a[names(a)==435]
435 
  3

Or convert it into a data.frame if you're more comfortable working with that:

> as.data.frame(table(numbers))
   numbers Freq
1        4    2
2        5    1
3       23    2
4       34    2
...

edited Dec 17 '09 at 17:32

answered Dec 17 '09 at 17:25

Shane

98,550
35
224
217

25

Don't forget about potential floating point issues, especially with table, which coerces numbers to strings. – hadley Dec 17 '09 at 18:10

score 314 · Answer 2 · answered Dec 17 '09 at 18:09

The most direct way is sum(numbers == x).

numbers == x creates a logical vector which is TRUE at every location that x occurs, and when suming, the logical vector is coerced to numeric which converts TRUE to 1 and FALSE to 0.

However, note that for floating point numbers it's better to use something like: sum(abs(numbers - x) < 1e-6).

score 83 · Answer 3 · answered Dec 17 '09 at 17:55

83

I would probably do something like this

length(which(numbers==x))

But really, a better way is

table(numbers)

answered Dec 17 '09 at 17:55

Jesse

849
5
2

10

`table(numbers)` is going to do a lot more work than the easiest solution, `sum(numbers==x)`, because it's going to figure out the counts of all the other numbers in the list too. – Ken Williams Dec 18 '09 at 19:41
1

the problem with table is that it's more difficult to include it inside more complex calculus, for example using apply() on dataframes – skan Dec 02 '15 at 12:16

score 40 · Answer 4 · edited Aug 24 '16 at 14:58

40

There is also count(numbers) from plyr package. Much more convenient than table in my opinion.

edited Aug 24 '16 at 14:58

MichaelChirico

33,841
14
113
198

answered Jun 06 '13 at 14:49

geotheory

22,624
29
119
196

score 37 · Answer 5 · answered Dec 13 '12 at 21:43

My preferred solution uses rle, which will return a value (the label, x in your example) and a length, which represents how many times that value appeared in sequence.

By combining rle with sort, you have an extremely fast way to count the number of times any value appeared. This can be helpful with more complex problems.

Example:

> numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
> a <- rle(sort(numbers))
> a
  Run Length Encoding
    lengths: int [1:15] 2 1 2 2 1 1 2 1 2 1 ...
    values : num [1:15] 4 5 23 34 43 54 56 65 67 324 ...

If the value you want doesn't show up, or you need to store that value for later, make a a data.frame.

> b <- data.frame(number=a$values, n=a$lengths)
> b
    values n
 1       4 2
 2       5 1
 3      23 2
 4      34 2
 5      43 1
 6      54 1
 7      56 2
 8      65 1
 9      67 2
 10    324 1
 11    435 3
 12    453 1
 13    456 1
 14    567 1
 15    657 1

I find it is rare that I want to know the frequency of one value and not all of the values, and rle seems to be the quickest way to get count and store them all.

score 22 · Answer 6 · answered Apr 19 '12 at 13:13

22

There is a standard function in R for that

tabulate(numbers)

answered Apr 19 '12 at 13:13

Sergej Andrejev

9,091
11
71
108

ishandutta2007 · Answer 7 · 2017-06-07T13:22:04.410

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435 453,435,324,34,456,56,567,65,34,435)

> length(grep(435, numbers))
[1] 3


> length(which(435 == numbers))
[1] 3


> require(plyr)
> df = count(numbers)
> df[df$x == 435, ] 
     x freq
11 435    3


> sum(435 == numbers)
[1] 3


> sum(grepl(435, numbers))
[1] 3


> sum(435 == numbers)
[1] 3


> tabulate(numbers)[435]
[1] 3


> table(numbers)['435']
435 
  3 


> length(subset(numbers, numbers=='435')) 
[1] 3

score 12 · Answer 8 · answered May 15 '15 at 12:35

If you want to count the number of appearances subsequently, you can make use of the sapply function:

index<-sapply(1:length(numbers),function(x)sum(numbers[1:x]==numbers[x]))
cbind(numbers, index)

Output:

        numbers index
 [1,]       4     1
 [2,]      23     1
 [3,]       4     2
 [4,]      23     2
 [5,]       5     1
 [6,]      43     1
 [7,]      54     1
 [8,]      56     1
 [9,]     657     1
[10,]      67     1
[11,]      67     2
[12,]     435     1
[13,]     453     1
[14,]     435     2
[15,]     324     1
[16,]      34     1
[17,]     456     1
[18,]      56     2
[19,]     567     1
[20,]      65     1
[21,]      34     2
[22,]     435     3

score 10 · Answer 9 · answered Dec 17 '09 at 17:27

10

here's one fast and dirty way:

x <- 23
length(subset(numbers, numbers==x))

answered Dec 17 '09 at 17:27

JD Long

59,675
58
202
294

score 7 · Answer 10 · edited Feb 19 '16 at 04:31

7

You can change the number to whatever you wish in following line

length(which(numbers == 4))

edited Feb 19 '16 at 04:31

Matthew

7,440
1
24
49

answered Feb 18 '16 at 09:34

uttkarsh dharmadhikari

79
1
2

```sum(numbers == 4)``` will also do. – ninpnin Aug 02 '22 at 13:04

score 6 · Answer 11 · answered Jun 26 '20 at 12:15

One option could be to use vec_count() function from the vctrs library:

vec_count(numbers)

   key count
1  435     3
2   67     2
3    4     2
4   34     2
5   56     2
6   23     2
7  456     1
8   43     1
9  453     1
10   5     1
11 657     1
12 324     1
13  54     1
14 567     1
15  65     1

The default ordering puts the most frequent values at top. If looking for sorting according keys (a table()-like output):

vec_count(numbers, sort = "key")

   key count
1    4     2
2    5     1
3   23     2
4   34     2
5   43     1
6   54     1
7   56     2
8   65     1
9   67     2
10 324     1
11 435     3
12 453     1
13 456     1
14 567     1
15 657     1

score 4 · Answer 12 · answered Dec 26 '14 at 07:11

One more way i find convenient is:

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
(s<-summary (as.factor(numbers)))

This converts the dataset to factor, and then summary() gives us the control totals (counts of the unique values).

Output is:

4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
2   1   2   2   1   1   2   1   2   1   3   1   1   1   1

This can be stored as dataframe if preferred.

as.data.frame(cbind(Number = names(s),Freq = s), stringsAsFactors=F, row.names = 1:length(s))

here row.names has been used to rename row names. without using row.names, column names in s are used as row names in new dataframe

Output is:

     Number Freq
1       4    2
2       5    1
3      23    2
4      34    2
5      43    1
6      54    1
7      56    2
8      65    1
9      67    2
10    324    1
11    435    3
12    453    1
13    456    1
14    567    1
15    657    1

pomber · Answer 13 · 2014-12-26T17:18:14.983

4

Using table but without comparing with names:

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435)
x <- 67
numbertable <- table(numbers)
numbertable[as.character(x)]
#67 
# 2

table is useful when you are using the counts of different elements several times. If you need only one count, use sum(numbers == x)

edited Dec 26 '14 at 17:18

answered Dec 26 '14 at 17:06

pomber

23,132
10
81
94

Therii · Answer 14 · 2018-11-16T17:51:20.317

There are different ways of counting a specific elements

library(plyr)
numbers =c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,7,65,34,435)

print(length(which(numbers==435)))

#Sum counts number of TRUE's in a vector 
print(sum(numbers==435))
print(sum(c(TRUE, FALSE, TRUE)))

#count is present in plyr library 
#o/p of count is a DataFrame, freq is 1 of the columns of data frame
print(count(numbers[numbers==435]))
print(count(numbers[numbers==435])[['freq']])

score 2 · Answer 15 · answered Mar 12 '20 at 16:40

This is a very fast solution for one-dimensional atomic vectors. It relies on match(), so it is compatible with NA:

x <- c("a", NA, "a", "c", "a", "b", NA, "c")

fn <- function(x) {
  u <- unique.default(x)
  out <- list(x = u, freq = .Internal(tabulate(match(x, u), length(u))))
  class(out) <- "data.frame"
  attr(out, "row.names") <- seq_along(u)
  out
}

fn(x)

#>      x freq
#> 1    a    3
#> 2 <NA>    2
#> 3    c    2
#> 4    b    1

You could also tweak the algorithm so that it doesn't run unique().

fn2 <- function(x) {
  y <- match(x, x)
  out <- list(x = x, freq = .Internal(tabulate(y, length(x)))[y])
  class(out) <- "data.frame"
  attr(out, "row.names") <- seq_along(x)
  out
}

fn2(x)

#>      x freq
#> 1    a    3
#> 2 <NA>    2
#> 3    a    3
#> 4    c    2
#> 5    a    3
#> 6    b    1
#> 7 <NA>    2
#> 8    c    2

In cases where that output is desirable, you probably don't even need it to re-return the original vector, and the second column is probably all you need. You can get that in one line with the pipe:

match(x, x) %>% `[`(tabulate(.), .)

#> [1] 3 2 3 2 3 1 2 2

Really great solution! Thats also the fastest one I could come up with. It can be a little bit improved for performance for factor input using u <- if(is.factor(x)) x[!duplicated(x)] else unique(x). — Taz, May 25 '20 at 14:00

score 2 · Answer 16 · answered Aug 31 '21 at 15:17

Base r solution in 2021

aggregate(numbers, list(num=numbers), length)

       num x
1        4 2
2        5 1
3       23 2
4       34 2
5       43 1
6       54 1
7       56 2
8       65 1
9       67 2
10     324 1
11     435 3
12     453 1
13     456 1
14     567 1
15     657 1

tapply(numbers, numbers, length)
  4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
  2   1   2   2   1   1   2   1   2   1   3   1   1   1   1 

by(numbers, list(num=numbers), length)
num: 4
[1] 2
-------------------------------------- 
num: 5
[1] 1
-------------------------------------- 
num: 23
[1] 2
-------------------------------------- 
num: 34
[1] 2
-------------------------------------- 
num: 43
[1] 1
-------------------------------------- 
num: 54
[1] 1
-------------------------------------- 
num: 56
[1] 2
-------------------------------------- 
num: 65
[1] 1
-------------------------------------- 
num: 67
[1] 2
-------------------------------------- 
num: 324
[1] 1
-------------------------------------- 
num: 435
[1] 3
-------------------------------------- 
num: 453
[1] 1
-------------------------------------- 
num: 456
[1] 1
-------------------------------------- 
num: 567
[1] 1
-------------------------------------- 
num: 657
[1] 1

score 1 · Answer 17 · edited Dec 17 '18 at 19:38

1

This can be done with outer to get a metrix of equalities followed by rowSums, with an obvious meaning.
In order to have the counts and numbers in the same dataset, a data.frame is first created. This step is not needed if you want separate input and output.

df <- data.frame(No = numbers)
df$count <- rowSums(outer(df$No, df$No, FUN = `==`))

edited Dec 17 '18 at 19:38

Rui Barradas

70,273
8
34
66

answered Dec 17 '18 at 15:52

GWD

1,387
10
22

score 1 · Answer 18 · answered Feb 21 '20 at 15:09

A method that is relatively fast on long vectors and gives a convenient output is to use lengths(split(numbers, numbers)) (note the S at the end of lengths):

# Make some integer vectors of different sizes
set.seed(123)
x <- sample.int(1e3, 1e4, replace = TRUE)
xl <- sample.int(1e3, 1e6, replace = TRUE)
xxl <-sample.int(1e3, 1e7, replace = TRUE)

# Number of times each value appears in x:
a <- lengths(split(x,x))

# Number of times the value 64 appears:
a["64"]
#~ 64
#~ 15

# Occurences of the first 10 values
a[1:10]
#~ 1  2  3  4  5  6  7  8  9 10 
#~ 13 12  6 14 12  5 13 14 11 14

The output is simply a named vector.
The speed appears comparable to rle proposed by JBecker and even a bit faster on very long vectors. Here is a microbenchmark in R 3.6.2 with some of the functions proposed:

library(microbenchmark)

f1 <- function(vec) lengths(split(vec,vec))
f2 <- function(vec) table(vec)
f3 <- function(vec) rle(sort(vec))
f4 <- function(vec) plyr::count(vec)

microbenchmark(split = f1(x),
               table = f2(x),
               rle = f3(x),
               plyr = f4(x))
#~ Unit: microseconds
#~   expr      min        lq      mean    median        uq      max neval  cld
#~  split  402.024  423.2445  492.3400  446.7695  484.3560 2970.107   100  b  
#~  table 1234.888 1290.0150 1378.8902 1333.2445 1382.2005 3203.332   100    d
#~    rle  227.685  238.3845  264.2269  245.7935  279.5435  378.514   100 a   
#~   plyr  758.866  793.0020  866.9325  843.2290  894.5620 2346.407   100   c 

microbenchmark(split = f1(xl),
               table = f2(xl),
               rle = f3(xl),
               plyr = f4(xl))
#~ Unit: milliseconds
#~   expr       min        lq      mean    median        uq       max neval cld
#~  split  21.96075  22.42355  26.39247  23.24847  24.60674  82.88853   100 ab 
#~  table 100.30543 104.05397 111.62963 105.54308 110.28732 168.27695   100   c
#~    rle  19.07365  20.64686  23.71367  21.30467  23.22815  78.67523   100 a  
#~   plyr  24.33968  25.21049  29.71205  26.50363  27.75960  92.02273   100  b 

microbenchmark(split = f1(xxl),
               table = f2(xxl),
               rle = f3(xxl),
               plyr = f4(xxl))
#~ Unit: milliseconds
#~   expr       min        lq      mean    median        uq       max neval  cld
#~  split  296.4496  310.9702  342.6766  332.5098  374.6485  421.1348   100 a   
#~  table 1151.4551 1239.9688 1283.8998 1288.0994 1323.1833 1385.3040   100    d
#~    rle  399.9442  430.8396  464.2605  471.4376  483.2439  555.9278   100   c 
#~   plyr  350.0607  373.1603  414.3596  425.1436  437.8395  506.0169   100  b

Importantly, the only function that also counts the number of missing values NA is plyr::count. These can also be obtained separately using sum(is.na(vec))

score 1 · Answer 19 · answered Jul 11 '20 at 13:25

Here is a way you could do it with dplyr:

library(tidyverse)

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
             453,435,324,34,456,56,567,65,34,435)
ord <- seq(1:(length(numbers)))

df <- data.frame(ord,numbers)

df <- df %>%
  count(numbers)

numbers     n
     <dbl> <int>
 1       4     2
 2       5     1
 3      23     2
 4      34     2
 5      43     1
 6      54     1
 7      56     2
 8      65     1
 9      67     2
10     324     1
11     435     3
12     453     1
13     456     1
14     567     1
15     657     1

score 0 · Answer 20 · answered Nov 13 '20 at 14:28

You can make a function to give you results.

# your list
numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
         453,435,324,34,456,56,567,65,34,435)

function1<-function(x){
    if(x==value){return(1)}else{ return(0) }
}

# set your value here
value<-4

# make a vector which return 1 if it equal to your value, 0 else
vector<-sapply(numbers,function(x) function1(x))
sum(vector)

result: 2

Counting the number of elements with the values of x in a vector

20 Answers20

Linked

Related