3

I want to lump infrequent levels with a factor variable for multiple variables into 'other'. I tried to reproduce the problem below. Animal and color are 2 factor variables that I want to lump. It does not work when I put them in a list and loop through the list. But it works for one variable. My actual data set has tens of such variables and I want to find a clean way to do this with the dplyr approach.

library(tidyverse)
library(forcats)

data <- data.frame(ID=rep(1:12), animal=c('dog','cat','fish','dog','dog','dog','fish','fish','fish','snake','fish','dog'),color=c('red','green','blue','red','green',
                                          'red','green','red','green','red','green','red'))

### Does not work when I use a list and for loop

factor_columns <- c('animal','color')
for (feature in factor_columns) {
  data <- data %>%
    mutate(feature = fct_lump_prop(
      f = feature,
      prop = 0.2,
      other_level = 'other'
    ))} 

### Works with one column

data <- data %>%
  mutate(animal = fct_lump_prop(
    f = animal,
    prop = 0.2,
    other_level = 'other'
  )) 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
achilet
  • 33
  • 2

1 Answers1

3

You can use across :

library(dplyr)
library(forcats)

data %>%
  mutate(across(factor_columns, fct_lump_prop,prop = 0.2,other_level = 'other'))
  #mutate_at in old dplyr
  #mutate_at(vars(factor_columns), fct_lump_prop,prop = 0.2,other_level = 'other')

You can also use lapply :

data[factor_columns] <- lapply(data[factor_columns], 
                         fct_lump_prop,prop = 0.2,other_level = 'other')
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213