2

I'd like to be able to find the most frequently occurring level in a factor in a dataset while using dplyr's piping structure. I'm trying to create a new variable that contains the 'modal' factor level when being grouped by another variable.

This is an example of what I'm looking for:

df <- data.frame(cat = stringi::stri_rand_strings(100, 1, '[A-Z]'), num = floor(runif(100, min=0, max=500)))
df <- df %>%
            dplyr::group_by(cat) %>%
            dplyr::mutate(cat_mode = Mode(num))

Where "Mode" is a function that I'm looking for

Nate
  • 10,361
  • 3
  • 33
  • 40
Parseltongue
  • 11,157
  • 30
  • 95
  • 160
  • 1
    Mode-determination is hard and problem-ridden, exacerbated when the data is not cleanly unimodal. Are you looking for the mathematic mode (with smoothing) or the most-frequent occurrence? – r2evans Jul 18 '18 at 23:04
  • I'm looking for the most frequent occurrence of a categorical variable with few levels (7) – Parseltongue Jul 18 '18 at 23:05
  • 1
    If Psidom's answer isn't good enough, it would help if you provide the expect result from this sample data. Because you are using random data, you'll need to revise your question with `set.seed` to make it reproducible. – r2evans Jul 18 '18 at 23:07
  • if your variable is already a `factor` you might be able to take advantage of `forcats::fct_infreq()` – Nate Jul 18 '18 at 23:11

2 Answers2

1

Use table to count the items and then use which.max to find out the most frequent one:

df %>%
    group_by(cat) %>%
    mutate(cat_mode = names(which.max(table(num)))) %>% 
    head()

# A tibble: 6 x 3
# Groups: cat [4]
#  cat      num cat_mode
#  <fctr> <dbl> <chr>   
#1 Q      305   138     
#2 W       34.0 212     
#3 R       53.0 53      
#4 D      395   5       
#5 W      212   212     
#6 Q      417   138  
# ...
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

similar question to Is there a built-in function for finding the mode?

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df %>% 
  group_by(cat) %>% 
  mutate(cat_mode = Mode(num))

# A tibble: 100 x 3
# Groups:   cat [26]
   cat     num cat_mode
   <fct> <dbl>    <dbl>
 1 S        25       25
 2 V        86      478
 3 R       335      335
 4 S       288       25
 5 S       330       25
 6 Q       384      384
 7 C       313      313
 8 H       275      275
 9 K       274      274
10 J        75       75
# ... with 90 more rows

To see for each factor

df %>% 
  group_by(cat) %>% 
  summarise(cat_mode = Mode(num))

 A tibble: 26 x 2
   cat   cat_mode
   <fct>    <dbl>
 1 A          480
 2 B          380
 3 C          313
 4 D          253
 5 E          202
 6 F           52
 7 G          182
 8 H          275
 9 I          356
10 J           75
# ... with 16 more rows
Vivek Katial
  • 543
  • 4
  • 17