Calculate frequency of occurrence in an array using R

Question

I have an array

a <- c(1,1,1,1,1,2,3,4,5,5,5,5,5,6,7,7,7,7)

I would like to use some command that would tell me which is the most frequent number in the array?

is there a simple command for this?

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2012-12-13T06:29:16.850

28

The table() function is sufficient for this, and particularly useful if your data have more than one mode.

Consider the following options, all related to table() and max().

# Your vector
a = c(1,1,1,1,1,2,3,4,5,5,5,5,5,6,7,7,7,7)

# Basic frequency table
table(a)
# a
# 1 2 3 4 5 6 7 
# 5 1 1 1 5 1 4 

# Only gives me the value for highest frequency
# Doesn't tell me which number that is though
max(table(a))
# [1] 5

# Gives me a logical vector, which might be useful
# but not what you're asking for in this question
table(a) == max(table(a))
# a
#    1     2     3     4     5     6     7 
# TRUE FALSE FALSE FALSE  TRUE FALSE FALSE 

# This is probably more like what you're looking for
which(table(a) == max(table(a)))
# 1 5 
# 1 5 

# Or, maybe this
names(which(table(a) == max(table(a))))
# [1] "1" "5"

As indicated in the comments, in some cases you might want to see the two or three most commonly occurring values, in which case sort() is useful:

sort(table(a))
# a
# 2 3 4 6 7 1 5 
# 1 1 1 1 4 5 5

You can also set a threshold for which values to return in your table. For instance, if you wanted to return only those numbers which occurred more than once:

sort(table(a)[table(a) > 1])
# a
# 7 1 5 
# 4 5 5

edited Dec 13 '12 at 06:29

answered Dec 12 '12 at 14:17

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

The `which.max` function could simplify the above a bit (but the long version given here is great for understanding). – Greg Snow Dec 12 '12 at 16:58
@GregSnow, that was the basic idea behind the extended version. As for use of `which.max`, doesn't that return only the *first* matching value. That is, if one were to use `names(which.max(table(a)))`, I think (can't check right now) they would only get `"1"` as the answer. Or did you have something else in mind that can be edited into the answer above? – A5C1D2H2I1M1N2O1R2T1 Dec 12 '12 at 17:06
you are correct that `which.max` only returns the 1st match. The chance of having more than 1 value equalling the max decreases with the number of possible values and size of the vector, so it is still reasonable in some cases, but yours is the more general solution. I prefer looking at the whole table (like you start with) because if the max is 100 and there is another value that has 99 then that is still very interesting but the automated "mode" procedures will not show that. – Greg Snow Dec 12 '12 at 17:18
@GregSnow,Thanks for your comments, and good point about sometimes being interested in other values that might be close in frequency to the mode. I edited my answer to include a suggestion to use `sort(table(a))` so that the most frequently occurring values are "clumped" together. My general suggestion in these types of analysis (with larger datasets, of course) is to first plot either a stripchart, stem-and-leaf plot, histogram, or density plot to get an "overview" of the data distribution before jumping into different measures of central tendency. – A5C1D2H2I1M1N2O1R2T1 Dec 13 '12 at 06:35

score 5 · Answer 2 · answered Nov 05 '14 at 22:45

Use table() function:

## Your vector:
a <- c(1,1,1,1,1,2,3,4,5,5,5,5,5,6,7,7,7,7)

## Frequency table
> counts <- table(a)

## The most frequent and its value
> counts[which.max(counts)]
# 1
# 5

## Or simply the most frequent
> names(counts)[which.max(counts)]
# [1] "1"

Carl Witthoft · Answer 3 · 2012-12-12T18:22:04.737

2

I wrote some personal code to find the mode and a little more (a few years ago. As Ananda showed, it's pretty obvious stuff) :

smode<-function(x){
    xtab<-table(x)
    modes<-xtab[max(xtab)==xtab]
    mag<-as.numeric(modes[1]) #in case mult. modes, this is safer
    #themodes<-names(modes)
    themodes<-as.numeric(names(modes))
    mout<-list(themodes=themodes,modeval=mag)
    return(mout)
    }

Blah blah copyright blah blah use as you like but don't make money off it.

edited Dec 12 '12 at 18:22

answered Dec 12 '12 at 16:21

Carl Witthoft

20,573
9
43
73

Shouldn't you include some error checking for when there are *no* modes? (Not that I should't also with my answer.) Also, how does your last statement about copyright etc work with a "cc-wiki" license. +1 for tidy presentation of the output though. – A5C1D2H2I1M1N2O1R2T1 Dec 12 '12 at 16:38
There's always a mode, pathologically, since there's always a maximum value in a `table` object. Granted it's not much use to return a list as long as the input :-). And I was just joking about copyright--thought the "blah blah" gave that away. – Carl Witthoft Dec 12 '12 at 18:20

score 0 · Answer 4 · edited May 23 '17 at 12:17

0

What you want is the mode of the data: there are a variety of different options for calculating it. The modeest package has a set of functions for mode estimation, but might be overkill for what you want.

Calculate frequency of occurrence in an array using R

4 Answers4

Linked