0

I am trying to collapse factor levels, and initially, the count(a7_edu2) output shows that the collapse has worked, but when I check the structure and look in the RStudio view, the change doesn't affect the actual variable.

Any advice for saving as a new variable or overwriting the old one? Thanks!

I have used fct_collapse to collapse into three categories and tried mutate() to create a new variable with the new levels. I have tried saving into a new variable and also transmute() instead of mutate(). I would be satisfied with either a new variable or replacing the old one.

  mutate(a7_edu2 = fct_collapse(a7_edu2,
    Highschool = c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"),
    Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"),
    Bachelors = c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate")
  )) %>%
  count(a7_edu2) # this is the result I want but when i check the structure, it doesn't save!


str(SCI_dem$a7_edu2)

I expected the output to be 'Factor w/ 4 levels "Highschool", "Diploma", "Bachelors", "other" but instead it gave the original "Factor w/ 13 levels "Elm School","Grade 7 or 8",..: 8 7 6 10 7 7 8 3 7 10 ..."


UPDATED QUESTION: It works to save the one variable to a new df (SCI_collpase). However, when I try save other new collapsed variables to the same dataframe, it overwrites the previous collapses... I have tried specifying new columns SCI_collapse$edu but then it renames the existing variables in the df... How to collapse multiple variables and add them each to a new df? Suggestions for saving or writing a pipe?

SCI_collapse <- SCI_dem %>% 
  mutate(a7_edu2 = fct_collapse(a7_edu2, 
                                Highschool = c("Elm School", 
                                                        "Grade 7 or 8", 
                                                        "Grade 9 to 11", 
                                                        "High School Diploma", 
                                                        "G.E.D"), 
                                Diploma = c("Diploma or Certificate from trade tech school" , 
                                            "Diploma or Certificate from community college or CEGEP"), 
                                Bachelors = c("Bachelor degree", 
                                              "Degree (Medicine, Dentistry etc)", 
                                              "Masters degree", "Doctorate")))
Cassandra
  • 137
  • 1
  • 9
  • The functions in `dplyr` like `mutate` return new/updated data frames, they do not update the original data frame in place. Be sure to save the results from the `mutate` to some variable. – MrFlick Jun 10 '19 at 21:27
  • Despite my earlier attempts at specifying a new variable, now saving to a new dataset works, thank you! ```SCI_collapse <- SCI_dem %>% mutate(a7_edu2 = fct_collapse(a7_edu2, Highschool = c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"), Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"), Bachelors = c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate") ))``` – Cassandra Jun 11 '19 at 01:33

1 Answers1

0

This is what I ended up doing:

# Collapse levels (education)
SCI_dem <- SCI_dem %>%
  mutate(a7_edu2_col = fct_collapse(a7_edu2,        # Save as new variable ending in _col
    Highschool= c("Elm School", "Grade 7 or 8", "Grade 9 to 11", "High School Diploma", "G.E.D"),
    Diploma = c("Diploma or Certificate from trade tech school" , "Diploma or Certificate from community college or CEGEP"),
    Bachelors= c("Bachelor degree", "Degree (Medicine, Dentistry etc)", "Masters degree", "Doctorate"),
    Other = c("Other", "Prefer not to answer")
  ), a7_edu2_col = droplevels(a7_edu2_col)) %>%      # drop empty levels of _col
  rename(a7_edu2_unc = a7_edu2)

I now have new variables ending in _col and have renamed the old variables to end in _unc (for uncollapsed). Then I clean things up by removing the columns ending in _unc.

SCI_dem <- select(SCI_dem, -ends_with("_unc"))

Which leaves me with my uncluttered, collapsed dataframe :)

Jamiu S.
  • 5,257
  • 5
  • 12
  • 34
Cassandra
  • 137
  • 1
  • 9