Collapsing categorieal levels R

Question

I have data (s_data).

levels(as.factor(s_data$education))

"10th"          "11th"         "12th"         "1st-4th"     
"5th-6th"      "7th-8th"      "9th"          "Assoc-acdm"    
"Assoc-voc"    "Bachelors"    "Doctorate"    "HS-grad"     
"Masters"      "Preschool"    "Prof-school"  "Some-college"

I am trying to collapse samples into one category. For example, "Preschool" and "1st-4th" would be one category, kids. I have tried a couple of approaches with no success.

s_data$education <- case_when(data$education %in% c("1st-4th", "5th-6th", "7th-8th",
                                      "9th","Preschool") ~ "kids") #s_data is an adjusted version of data

This approach tries to replace each row and doesn't yield anything but an error.

I have tried our teacher's approach and when I tried to plot the new data, it did not consist the new category ("kids") at all.

levels(as.factor(s_data$education)) <- c("10th","11th,", "kids", "kids", "kids","7th-8th", "9th", "Assoc-acdm",
                                         "Assoc-voc", "Bachelors","Doctorate","HS-grad", "Masters","kids", "Prof-school",
                                         "Some-college")

Do you have ideas how can I collapse these levels into one category?

Thank you!

First, *"doesn't yield anything but an error"* would do better if you included the error. Second, your `case_when` is incomplete: you check for one membership and reassign based on it, and then discard everything else in `education`; it is not doing a piecewise replacement, I suggest you read its documentation to understand what it is doing. Third, I don't know if it is originally a `factor` or if your `levels` call was merely to show all unique values. Can you provide an unambiguous sample of that vector? Perhaps `dput(s_data$education)` (or a subset of it if it is a large dataset). — r2evans, Dec 11 '20 at 14:57
Error in `$<-.data.frame`(`*tmp*`, education, value = c(NA, NA, NA, NA, : replacement has 48842 rows, data has 48182 levels calls was merely to show all unique values. dput(s_data$education) - provides every unique value of each row. What I need to do is to create a new category which would overwrite categories I wish to collapse. I hope I am more clear now. — Shlomi, Dec 11 '20 at 15:07

score 0 · Answer 1 · answered Dec 11 '20 at 15:42

Because you keep calling as.factor (and we don't have your data, large as it may be), it is not clear to me if your s_data$education is class character or factor.

if is.character(s_data$education), then

educ <- c("10th", "11th", "12th", "1st-4th", "5th-6th", "7th-8th", "9th", "Assoc-acdm", "Assoc-voc", "Bachelors", "Doctorate", "HS-grad", "Masters", "Preschool", "Prof-school", "Some-college")
educ[educ %in% c("9th", "10th", "11th", "12th")] <- "HS"
educ
#  [1] "HS"           "HS"           "HS"           "1st-4th"      "5th-6th"      "7th-8th"      "HS"          
#  [8] "Assoc-acdm"   "Assoc-voc"    "Bachelors"    "Doctorate"    "HS-grad"      "Masters"      "Preschool"   
# [15] "Prof-school"  "Some-college"

if is.factor(s_data$education), then make sure you pre-add the new levels (or make sure they are already present) and then reassign:

educ <- factor(c("10th", "11th", "12th", "1st-4th", "5th-6th", "7th-8th", "9th", "Assoc-acdm", "Assoc-voc", "Bachelors", "Doctorate", "HS-grad", "Masters", "Preschool", "Prof-school", "Some-college"))
educ
#  [1] 10th         11th         12th         1st-4th      5th-6th      7th-8th      9th          Assoc-acdm   Assoc-voc   
# [10] Bachelors    Doctorate    HS-grad      Masters      Preschool    Prof-school  Some-college
# 16 Levels: 10th 11th 12th 1st-4th 5th-6th 7th-8th 9th Assoc-acdm Assoc-voc Bachelors Doctorate HS-grad ... Some-college
levels(educ)
#  [1] "10th"         "11th"         "12th"         "1st-4th"      "5th-6th"      "7th-8th"      "9th"         
#  [8] "Assoc-acdm"   "Assoc-voc"    "Bachelors"    "Doctorate"    "HS-grad"      "Masters"      "Preschool"   
# [15] "Prof-school"  "Some-college"
levels(educ) <- c("HS", levels(educ))
educ[educ %in% c("9th", "10th", "11th", "12th")] <- "HS"
educ
#  [1] HS          HS          HS          HS          1st-4th     5th-6th     7th-8th     HS          Assoc-acdm 
# [10] Assoc-voc   Bachelors   Doctorate   HS-grad     Masters     Preschool   Prof-school
# 17 Levels: HS 10th 11th 12th 1st-4th 5th-6th 7th-8th 9th Assoc-acdm Assoc-voc Bachelors Doctorate HS-grad ... Some-college

At some point, you may want/need to remove the now-absent levels from your data.

This might be facilitated with tidyverse's forcats package, see Cleaning up factor levels (collapsing multiple levels/labels) and Grouping 2 levels of a factor in R.

Collapsing categorieal levels R

1 Answers1