2

I have a column in a dataframe with over 40 levels, I want to make it 4 levels. The important variables are "ecommerce", "technology", and "consumer goods", everything else I want to fall under "other". How can I make it into 4 levels?

sjoelly
  • 147
  • 2
  • 11

3 Answers3

6

We can use %in% to check :

df$column_name <- as.character(df$column_name)
df$column_name[!df$column_name %in% c('ecommerce', 'technology', 'consumer goods')] <- 'Other'

If you want to keep the column as factors :

levels(df$column_name) <- c(levels(df$column_name), 'Other')
df$column_name[!df$column_name %in% c('ecommerce', 'technology', 'consumer goods')] <- 'Other'
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I want "ecommerce", "technology" and "consumer goods" to also be levels, sorry I didn't make that clear! So I want to make it 4 levels, not binary. @RonakShah – sjoelly Apr 30 '20 at 02:38
  • @sjoelly The result is not binary in both the options. It will give you 4 levels if you use second option. – Ronak Shah Apr 30 '20 at 02:40
1

forcats::fct_other() was designed for exactly this:

library(forcats)

fct_other(my_var, keep = c('ecommerce', 'technology', 'consumer goods'))
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
1

You can add an extra column using your new categories with "ifelse" in a dplyr chain.

newdf <- df%>%
mutate(newcategories=factor(ifelse(allcategories %in% c("ecommerce", "technology", "consumer goods"), allcategories, "Other")))

This would allow you to check the frequency of categories assigned to "Other" :

newdf%>%
group_by(newcategories,allcategories)%>%
filter(newcategories=="Other")%>%
count()%>%
arrange(desc(n))
Dealec
  • 287
  • 1
  • 5