-1

I have a categorical variable with over 1000 levels. I want to group levels together so that I can reduce the dimensionality and just have 5 general level. I want to take the group names and group similar values together.

For example, all levels that contain the word "immune" I want to group into a new group called "immune group". All levels that contain the word "eyes" I want to group into a new group called "eye group", etc.

I've tried str_detect and grepl with little success in R . Any other methods that could efficiently do this?

statquest
  • 1
  • 2
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Maybe [forcats](https://forcats.tidyverse.org/reference/index.html) can help. – MrFlick Dec 15 '22 at 21:42

2 Answers2

0

maybe using case_when from dplyr with str_detect. But it would help to have a reproductible example

uow
  • 120
  • 7
  • 2
    If there isn't enough information in the question to answer it, you should post a comment asking for clarification (you have enough rep) rather than post an answer that's just a guess – camille Dec 16 '22 at 16:48
  • didn't know, thanks for telling! – uow Dec 16 '22 at 19:48
0
library(dplyr)
library(stringr)
x = c("immune1","immune2","eyes1","eyes2")
case_when(
  str_detect(x,"immune")~"immune group",
  str_detect(x,"eyes")~"eye group",
  T~NA_character_)
bischrob
  • 544
  • 3
  • 10