0

I have data (s_data).

levels(as.factor(s_data$education))

"10th"          "11th"         "12th"         "1st-4th"     
"5th-6th"      "7th-8th"      "9th"          "Assoc-acdm"    
"Assoc-voc"    "Bachelors"    "Doctorate"    "HS-grad"     
"Masters"      "Preschool"    "Prof-school"  "Some-college"

I am trying to collapse samples into one category. For example, "Preschool" and "1st-4th" would be one category, kids. I have tried a couple of approaches with no success.

s_data$education <- case_when(data$education %in% c("1st-4th", "5th-6th", "7th-8th",
                                      "9th","Preschool") ~ "kids") #s_data is an adjusted version of data

This approach tries to replace each row and doesn't yield anything but an error.

I have tried our teacher's approach and when I tried to plot the new data, it did not consist the new category ("kids") at all.

levels(as.factor(s_data$education)) <- c("10th","11th,", "kids", "kids", "kids","7th-8th", "9th", "Assoc-acdm",
                                         "Assoc-voc", "Bachelors","Doctorate","HS-grad", "Masters","kids", "Prof-school",
                                         "Some-college")

Do you have ideas how can I collapse these levels into one category?

Thank you!

r2evans
  • 141,215
  • 6
  • 77
  • 149
Shlomi
  • 1
  • 2
  • 1
    First, *"doesn't yield anything but an error"* would do better if you included the error. Second, your `case_when` is incomplete: you check for one membership and reassign based on it, and then discard everything else in `education`; it is not doing a piecewise replacement, I suggest you read its documentation to understand what it is doing. Third, I don't know if it is originally a `factor` or if your `levels` call was merely to show all unique values. Can you provide an unambiguous sample of that vector? Perhaps `dput(s_data$education)` (or a subset of it if it is a large dataset). – r2evans Dec 11 '20 at 14:57
  • Error in `$<-.data.frame`(`*tmp*`, education, value = c(NA, NA, NA, NA, : replacement has 48842 rows, data has 48182 levels calls was merely to show all unique values. dput(s_data$education) - provides every unique value of each row. What I need to do is to create a new category which would overwrite categories I wish to collapse. I hope I am more clear now. – Shlomi Dec 11 '20 at 15:07

1 Answers1

0

Because you keep calling as.factor (and we don't have your data, large as it may be), it is not clear to me if your s_data$education is class character or factor.

  • if is.character(s_data$education), then

    educ <- c("10th", "11th", "12th", "1st-4th", "5th-6th", "7th-8th", "9th", "Assoc-acdm", "Assoc-voc", "Bachelors", "Doctorate", "HS-grad", "Masters", "Preschool", "Prof-school", "Some-college")
    educ[educ %in% c("9th", "10th", "11th", "12th")] <- "HS"
    educ
    #  [1] "HS"           "HS"           "HS"           "1st-4th"      "5th-6th"      "7th-8th"      "HS"          
    #  [8] "Assoc-acdm"   "Assoc-voc"    "Bachelors"    "Doctorate"    "HS-grad"      "Masters"      "Preschool"   
    # [15] "Prof-school"  "Some-college"
    
  • if is.factor(s_data$education), then make sure you pre-add the new levels (or make sure they are already present) and then reassign:

    educ <- factor(c("10th", "11th", "12th", "1st-4th", "5th-6th", "7th-8th", "9th", "Assoc-acdm", "Assoc-voc", "Bachelors", "Doctorate", "HS-grad", "Masters", "Preschool", "Prof-school", "Some-college"))
    educ
    #  [1] 10th         11th         12th         1st-4th      5th-6th      7th-8th      9th          Assoc-acdm   Assoc-voc   
    # [10] Bachelors    Doctorate    HS-grad      Masters      Preschool    Prof-school  Some-college
    # 16 Levels: 10th 11th 12th 1st-4th 5th-6th 7th-8th 9th Assoc-acdm Assoc-voc Bachelors Doctorate HS-grad ... Some-college
    levels(educ)
    #  [1] "10th"         "11th"         "12th"         "1st-4th"      "5th-6th"      "7th-8th"      "9th"         
    #  [8] "Assoc-acdm"   "Assoc-voc"    "Bachelors"    "Doctorate"    "HS-grad"      "Masters"      "Preschool"   
    # [15] "Prof-school"  "Some-college"
    levels(educ) <- c("HS", levels(educ))
    educ[educ %in% c("9th", "10th", "11th", "12th")] <- "HS"
    educ
    #  [1] HS          HS          HS          HS          1st-4th     5th-6th     7th-8th     HS          Assoc-acdm 
    # [10] Assoc-voc   Bachelors   Doctorate   HS-grad     Masters     Preschool   Prof-school
    # 17 Levels: HS 10th 11th 12th 1st-4th 5th-6th 7th-8th 9th Assoc-acdm Assoc-voc Bachelors Doctorate HS-grad ... Some-college
    

    At some point, you may want/need to remove the now-absent levels from your data.

This might be facilitated with tidyverse's forcats package, see Cleaning up factor levels (collapsing multiple levels/labels) and Grouping 2 levels of a factor in R.

r2evans
  • 141,215
  • 6
  • 77
  • 149