I have a column in a dataframe with over 40 levels, I want to make it 4 levels. The important variables are "ecommerce", "technology", and "consumer goods", everything else I want to fall under "other". How can I make it into 4 levels?
Asked
Active
Viewed 153 times
3 Answers
6
We can use %in%
to check :
df$column_name <- as.character(df$column_name)
df$column_name[!df$column_name %in% c('ecommerce', 'technology', 'consumer goods')] <- 'Other'
If you want to keep the column as factors :
levels(df$column_name) <- c(levels(df$column_name), 'Other')
df$column_name[!df$column_name %in% c('ecommerce', 'technology', 'consumer goods')] <- 'Other'

Ronak Shah
- 377,200
- 20
- 156
- 213
-
I want "ecommerce", "technology" and "consumer goods" to also be levels, sorry I didn't make that clear! So I want to make it 4 levels, not binary. @RonakShah – sjoelly Apr 30 '20 at 02:38
-
@sjoelly The result is not binary in both the options. It will give you 4 levels if you use second option. – Ronak Shah Apr 30 '20 at 02:40
1
forcats::fct_other()
was designed for exactly this:
library(forcats)
fct_other(my_var, keep = c('ecommerce', 'technology', 'consumer goods'))

Ritchie Sacramento
- 29,890
- 4
- 48
- 56
1
You can add an extra column using your new categories with "ifelse" in a dplyr chain.
newdf <- df%>%
mutate(newcategories=factor(ifelse(allcategories %in% c("ecommerce", "technology", "consumer goods"), allcategories, "Other")))
This would allow you to check the frequency of categories assigned to "Other" :
newdf%>%
group_by(newcategories,allcategories)%>%
filter(newcategories=="Other")%>%
count()%>%
arrange(desc(n))

Dealec
- 287
- 1
- 5