I have a categorical variable that has 14 levels. I need to collapse several levels. For example I have one level of this variable which is,”Start an Associate’s degree”. Another level of this variable is,”Complete an Associate’s degree”. I would like to merge these two levels into one level which would be called, “ Complete an Associate’s degree”. What is the most efficient way to do this in R.
Asked
Active
Viewed 364 times
0
-
1Please can you provide a reproducible example with example data? – coffeinjunky Aug 08 '21 at 02:04
-
Do you currently have an inefficient way to do this? Is this a performance bottleneck? Maybe try the `dplyr::recode()` function. – MrFlick Aug 08 '21 at 02:06
-
If the below answers aren't helping, please have a look here on how to use `dput` and provide a sample of your data. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – coffeinjunky Aug 08 '21 at 09:20
2 Answers
0
Try something like the following:
library(forcats)
df <- data.frame(col1 = factor(c('a', 'b', 'c')))
# to recode the factor level 'b' as c:
df$col1 <- fct_recode(df$col1, c = 'b')
str(df)
#> 'data.frame': 3 obs. of 1 variable:
#> $ col1: Factor w/ 2 levels "a","c": 1 2 2
Created on 2021-08-08 by the reprex package (v2.0.1)

coffeinjunky
- 11,254
- 39
- 57
-
As I shared above, I am new to R, so please be nice. I tried to use some R code to adjust the levels. Q1 is my data set. X1PAREDEXPCT is the cat variable that is ordinal. One level in this variable is "Start, but not complete Associates degree". Another level in this variable is "Complete Associate's degree". – Byron Sharer Robertson Aug 08 '21 at 06:42
-
I used the following code to try to merge the two levels: levels(Q1$X1PAREDEXPCT)[levels(Z2$X1PAREDEXPCT)=="Start, but not complete Associate's degree"] <-"Complete an Associate's degree" – Byron Sharer Robertson Aug 08 '21 at 06:43
0
Just use levels
:
x <- factor(LETTERS[1:10])
x
# [1] A B C D E F G H I J
# Levels: A B C D E F G H I J
table(x)
# x
# A B C D E F G H I J
# 1 1 1 1 1 1 1 1 1 1
levels(x) <- c("A", "B", "C", "A", "B", "C", "G", "G", "J", "J")
x
# [1] A B C A B C G G J J
# Levels: A B C G J
table(x)
# x
# A B C G J
# 2 2 2 2 2

dcarlson
- 10,936
- 2
- 15
- 18
-
Thanks so much for the response. I am new to R, so please excuse my naivite. I am trying to combine the levels using the following code: – Byron Sharer Robertson Aug 08 '21 at 06:39
-
levels(Q1$X1PAREDEXPCT)[levels(Z2$X1PAREDEXPCT)=="Start, but not complete Associate's degree"] <-"Complete an Associate's degree" – Byron Sharer Robertson Aug 08 '21 at 06:39
-
Q1 is my data set. X1PAREDEXPCT is the cat variable that is ordinal. One level in this variable is "Start, but not complete Associates degree". Another level in this variable is "Complete Associate's degree". – Byron Sharer Robertson Aug 08 '21 at 06:44
-
You need to edit your question to add these details. If the categories are ordinal, you should only be combining adjacent categories. We need to know how many categories there are and how many you want to combine. – dcarlson Aug 08 '21 at 13:41
-
-
"Less than high school" "High school diploma or GED" "Start an Associate's degree" [4] "Complete an Associate's degree" "Start a Bachelor's degree" "Complete a Bachelor's degree" [7] "Start a Master's degree" "Complete a Master's degree" "Start Ph.D/M.D/Law/other prof degree" [10] "Complete Ph.D/M.D/Law/other prof degree" "Don't know" "Item legitimate skip/NA" [13] "Unit non-response" "Missing" – Byron Sharer Robertson Aug 09 '21 at 04:56