0

I want to group by column a and choose the most common factor b for each unique a. For example:

tibble(a = c(1,1,1,2,2,2), b = factor(c('cat', 'dog', 'cat', 'cat', 'dog', 'dog'))) %>%
    reframe(b = most_common(b), .by = a)

I want this to produce:

a b
1 cat
2 dog

However, the most_common function doesn't exist. Is there an efficient R function for this purpose? This must be a pretty common need for data cleaning (what I need it for). I searched and found people implementing mode functions. I could use one of those, but they seemed inefficient. Is there a better approach to this overall problem?

thelatemail
  • 91,185
  • 12
  • 128
  • 188
at.
  • 50,922
  • 104
  • 292
  • 461
  • Actually R has a `mode` function (with other purpose though). Here's some suggestions for finding the mode (most common value) with R: https://stackoverflow.com/questions/2547402/how-to-find-the-statistical-mode – I_O May 01 '23 at 21:20
  • "They seemed inefficient" - what about the `fmode` function from the *collapse* package as noted here - https://stackoverflow.com/a/60765505/496803 - at the `r-faq` question for this topic? I would be shocked if that was not somewhat efficient as the processing is all pushed to compiled code, and it has a specific method for a grouped tibble. – thelatemail May 01 '23 at 21:33

1 Answers1

0

We can use table + max.col

d <- table(df)
data.frame(
  a = as.numeric(row.names(d)),
  b = colnames(d)[max.col(d)]
)

which gives

  a   b
1 1 cat
2 2 dog

or using dplyr like below

  group_by(a) %>%
  summarise(b = names(which.max(table(b))))

which gives

# A tibble: 2 × 2
      a b
  <dbl> <chr>
1     1 cat
2     2 dog
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • I like your second solution, but `summarise` is deprecated. Apparently, `reframe` is supposed to be used instead. – at. May 01 '23 at 23:36
  • @at. - nope, `summarise` is not deprecated. Its use to return 0 or >1 rows is deprecated in preference to `reframe`. See https://dplyr.tidyverse.org/reference/summarise.html – thelatemail May 02 '23 at 01:31