1

I have a sample data in R.

df <- data.frame(year = c("2020", "2020", "2020", "2020", "2021", "2021", "2021", "2021"), type = c("circle", "circle", "triangle", "star", "circle", "triangle", "star"))

I need to find the mode of type for each year. If type column has same number of values for a year the mode preference will be as following: star > circle > triangle.

So my desired output will be:

2020 : 'circle',

2021 : 'star'

I am trying something similar to this:

mode <- function(codes){
  which.max(tabulate(codes))
}

mds <- df %>%
  group_by(year) %>%
  summarise(mode = mode(type))

This isn't working as type column isn't numeric.

solo
  • 71
  • 7

1 Answers1

1

Consider changing the mode function by tabulateing on a numeric index by replacing the values with the matching index

mode <- function(x) {
    ux <- unique(x)
    ux[which.max(tabulate(match(x, ux)))]
    }

Or another option is to convert to factor as tabulate needs either a numeric or factor input

mode <- function(x, lvls) {
    ux <- lvls
    ux[which.max(tabulate(factor(x, levels = ux)))]
    }

Now, apply it on the group by

df %>%
  group_by(year) %>%
  summarise(mode = mode(type, lvls = c('star', 'circle', 'triangle')))
 
# A tibble: 2 x 2
#  year  mode  
#* <chr> <chr> 
#1 2020  circle
#2 2021  star

data

df <- structure(list(year = c("2020", "2020", "2020", "2020", "2021", 
"2021", "2021", "2021"), type = c("circle", "circle", "triangle", 
"star", "circle", "triangle", "star", "star")), class = "data.frame",
row.names = c(NA, 
-8L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for your answer! It mostly works in my case but for equal amount of counts of type in a year, it's not giving correct output as I have mentioned in my question star > circle > triangle. Can you help on that? – solo Mar 03 '21 at 19:45
  • 1
    @solo try the updated function – akrun Mar 03 '21 at 19:49
  • 1
    it worked! Excellent job! – solo Mar 03 '21 at 19:53