I'm having difficulty combing levels of a factor together to have fewer levels, and convert letter levels into dummy codes. I have a 10-level factor called Marital_Status which I would like to combine into 4 levels. For instance, I would like to combine levels B and G into 0, levels C, D, H, and I into 1, levels E and F into 2, and levels A and J into level 3.
Asked
Active
Viewed 451 times
1 Answers
1
Use the excellent new forcats
package.
library(forcats)
# dummy dataset
df_foo = data_frame(
X1 = sample(LETTERS[1:10], 100, replace = TRUE)
)
# collapse factor variable into fewer levels
df_foo = df_foo %>%
mutate(
X2 = fct_collapse(
X1,
"0" = c("B", "G"),
"1" = c("C", "D", "H", "I"),
"2" = c("E", "F"),
"3" = c("A", "J")
)
)

tchakravarty
- 10,736
- 12
- 72
- 116
-
The code works great but how do I get it to create the new variable in my main data.frame? – Austin Mar 13 '17 at 05:12
-
Which is nearly identical to the existing `levels<-` capability of base R - `\`levels<-\`(df_foo$X1, list( "0" = c("B", "G"), "1" = c("C", "D", "H", "I"), "2" = c("E", "F"), "3" = c("A", "J") ) )` for instance. – thelatemail Mar 13 '17 at 05:15
-
@thelatemail is there reason for choosing one function over another? I have been trying to get this code to work for almost an hour now and Hadley's R for Data Science book suggests the forcats package. I will try the levels code you provide, it is much simpler than the examples you linked from previous posts. – Austin Mar 13 '17 at 05:23
-
@AustinMullings - personal preference largely. Hadley wrote `forcats` so it stands to reason he'd suggest it. I'd say choose whatever you find works best for you. `forcats` has a bunch of other factor manipulation functions which you might find useful. – thelatemail Mar 13 '17 at 05:29
-
@thelatemail My new job revolves predominately around categorical variables which I have rarely had to work with so I will look more at the various capabilities in the forcats package. Thanks! – Austin Mar 13 '17 at 05:33
-
@AustinMullings `mutate` assigns the collapsed factor variable to the new variable `X2`. Just assign the `mutate`d `data_frame` back to `df_foo` (or a new `data_frame`. I have updated my answer to do the former. – tchakravarty Mar 13 '17 at 06:03