Calculating the mode or 2nd/3rd/4th most common value

Question

Surely there has to be a function out there in some package for this?

I've searched and I've found this function to calculate the mode:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

But I'd like a function that lets me easily calculate the 2nd/3rd/4th/nth most common value in a column of data.

Ultimately I will apply this function to a large number of dplyr::group_by()s.

Thank you for your help!

That might need a sort(). And that could obviously be enhanced to handle multiple modes. — IRTFM, Sep 01 '16 at 01:20

Zheyuan Li · Accepted Answer · 2016-09-01T01:18:15.653

Maybe you could try

f <- function (x) with(rle(sort(x)), values[order(lengths, decreasing = TRUE)])

This gives unique vector values sorted by decreasing frequency. The first will be the mode, the 2nd will be 2nd most common, etc.

Another method is to based on table():

g <- function (x) as.numeric(names(sort(table(x), decreasing = TRUE)))

But this is not recommended, as input vector x will be coerced to factor first. If you have a large vector, this is very slow. Also on exit, we have to extract character names and of the table and coerce it to numeric.

Example

set.seed(0); x <- rpois(100, 10)
f(x)
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16

Let's compare with the contingency table from table:

tab <- sort(table(x), decreasing = TRUE)
# 11 12  7  9  8 13 10 14  5 15  6  2  3 16 
# 14 14 11 11 10 10  9  7  5  4  2  1  1  1

as.numeric(names(tab))
# [1] 11 12  7  9  8 13 10 14  5 15  6  2  3 16

So the results are the same.

I hear you. Not right or wrong, just shorter. You might anger the `T` and `F` sticklers. They don't like abbreviating the logicals. — Pierre L, Sep 01 '16 at 01:17

Tony Chang · Answer 2 · 2022-08-14T05:06:24.953

Here is an R function that I made (inspired by several other SO posts), which may work for your goal (and I use a local dataset on religious affiliation to illustrate it):

It's simple; only R base functions are involved: length, match, sort, tabulate, table, unique, which, as.character.

    Find_Nth_Mode = function(d, N = 2) {
      maxN = function(x, N){
        len = length(x)
        if(N>len){
          warning('N greater than length(x).  Setting N=length(x)')
          N = length(x)
        }
        sort(x,partial=len-N+1)[len-N+1]
      }
      
      (ux = unique(as.character(d)))
      (match(d, ux))
      (a1 = tabulate(match(d, ux)))
      (a2 = maxN(a1, N))
      (a3 = which(a1 == a2))
      (ux[a3])
    }

Sample Output

> table(religion_data$relig11)
                   0.None 1.Protestant_Conservative      2.Protestant_Liberal                3.Catholic 
                    34486                      6134                     19678                     36880 
               4.Orthodox             5.Islam_Sunni              6.Islam_Shia                   7.Hindu 
                    20702                     28170                       668                      4653 
               8.Buddhism                  9.Jewish                  10.Other 
                     9983                       381                      6851 
> Find_Nth_Mode(religion_data$relig11, 1)
[1] "3.Catholic"
> Find_Nth_Mode(religion_data$relig11, 2)
[1] "0.None"
> Find_Nth_Mode(religion_data$relig11, 3)
[1] "5.Islam_Sunni"

Reference: I want to express my gratitude to these posts, from which I get the two functions and integrate them into one:

function to find the N th largest value: Fastest way to find second (third...) highest/lowest value in vector or column
how to find the second largest mode value? Calculating the mode or 2nd/3rd/4th most common value

Calculating the mode or 2nd/3rd/4th most common value

2 Answers2