0

I'm having difficulty combing levels of a factor together to have fewer levels, and convert letter levels into dummy codes. I have a 10-level factor called Marital_Status which I would like to combine into 4 levels. For instance, I would like to combine levels B and G into 0, levels C, D, H, and I into 1, levels E and F into 2, and levels A and J into level 3.

Austin
  • 153
  • 2
  • 11
  • Two examples of this are given in the help file for `?levels` - something like http://stackoverflow.com/a/42686385/496803 – thelatemail Mar 13 '17 at 05:03

1 Answers1

1

Use the excellent new forcats package.

library(forcats)

# dummy dataset
df_foo = data_frame(
  X1 = sample(LETTERS[1:10], 100, replace = TRUE)
)

# collapse factor variable into fewer levels
df_foo = df_foo %>% 
  mutate(
    X2 = fct_collapse(
      X1,
      "0" = c("B", "G"),
      "1" = c("C", "D", "H", "I"),
      "2" = c("E", "F"),
      "3" = c("A", "J")
    )
  )
tchakravarty
  • 10,736
  • 12
  • 72
  • 116
  • The code works great but how do I get it to create the new variable in my main data.frame? – Austin Mar 13 '17 at 05:12
  • Which is nearly identical to the existing `levels<-` capability of base R - `\`levels<-\`(df_foo$X1, list( "0" = c("B", "G"), "1" = c("C", "D", "H", "I"), "2" = c("E", "F"), "3" = c("A", "J") ) )` for instance. – thelatemail Mar 13 '17 at 05:15
  • @thelatemail is there reason for choosing one function over another? I have been trying to get this code to work for almost an hour now and Hadley's R for Data Science book suggests the forcats package. I will try the levels code you provide, it is much simpler than the examples you linked from previous posts. – Austin Mar 13 '17 at 05:23
  • @AustinMullings - personal preference largely. Hadley wrote `forcats` so it stands to reason he'd suggest it. I'd say choose whatever you find works best for you. `forcats` has a bunch of other factor manipulation functions which you might find useful. – thelatemail Mar 13 '17 at 05:29
  • @thelatemail My new job revolves predominately around categorical variables which I have rarely had to work with so I will look more at the various capabilities in the forcats package. Thanks! – Austin Mar 13 '17 at 05:33
  • @AustinMullings `mutate` assigns the collapsed factor variable to the new variable `X2`. Just assign the `mutate`d `data_frame` back to `df_foo` (or a new `data_frame`. I have updated my answer to do the former. – tchakravarty Mar 13 '17 at 06:03