2

Subsetting a large dataframe leaves us with a factor variable that needs reordering and dropping of missing factors. A reprex is below:

library(tidyverse)

set.seed(1234)

data <- c("10th Std. Pass", "11th Std. Pass", "12th Std. Pass", "5th Std. Pass", 
          "6th Std. Pass", "Diploma / certificate course", "Graduate", "No Education")

education <-  factor(sample(data, size = 5, replace = TRUE), 
                     levels = c(data, "Data not available"))

survey <-  tibble(education)

The code further below, as per this answer, achieves what we want but we'd like to integrate the reordering and dropping of factors into our piped recoding of the survey.

recoded_s <- survey %>% mutate(education =
  fct_collapse(education,
"None" = "No Education",
"Primary" = c("5th Std. Pass", "6th Std. Pass"),
"Secondary" = c("10th Std. Pass", "11th Std. Pass", "12th Std. Pass"), 
"Tertiary" = c("Diploma / certificate course", "Graduate")
  ))

recoded_s$education
#> [1] Secondary Primary   Primary   Primary   Tertiary 
#> Levels: Secondary Primary Tertiary None Data not available


# Re-ordering and dropping variables
factor(recoded_s$education, levels = c("None", "Primary", "Secondary", "Tertiary"))
#> [1] Secondary Primary   Primary   Primary   Tertiary 
#> Levels: None Primary Secondary Tertiary

Any pointers would be much appreciated!

zx8754
  • 52,746
  • 12
  • 114
  • 209
Fons MA
  • 1,142
  • 1
  • 12
  • 21

1 Answers1

3

I'm not sure I understand. Could you elaborate why wrapping everything inside a mutate call doesn't suffice?

library(tidyverse)
library(forcats)
survey %>%
    mutate(
        education = fct_collapse(
            education,
            "None" = "No Education",
            "Primary" = c("5th Std. Pass", "6th Std. Pass"),
            "Secondary" = c("10th Std. Pass", "11th Std. Pass", "12th Std. Pass"),
            "Tertiary" = c("Diploma / certificate course", "Graduate")),
        education = factor(education, levels = c("None", "Primary", "Secondary", "Tertiary")))

Alternative using dplyr::recode

lvls <- list(
    "No Education" = "None",
    "5th Std. Pass" = "Primary",
    "6th Std. Pass" = "Primary",
    "10th Std. Pass" = "Secondary",
    "11th Std. Pass" = "Secondary",
    "12th Std. Pass" = "Secondary",
    "Diploma / certificate course" = "Tertiary",
    "Graduate" = "Tertiary")
survey %>%
    mutate(
        education = factor(recode(education, !!!lvls), unique(map_chr(lvls, 1))))
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thanks for that! I genuinely didn't realise I could just plug it in... At the same time, I was hoping for something more succinct either in forcats or elsewhere that would allow for those actions to be carried out simultaneously; it seems like a fairly common thing to want to do? – Fons MA Oct 18 '18 at 10:28
  • @FonsMA I've added an alternative approach using `dplyr::recode` which requires a `list` of the old to new factor level mappings. Please take a look. – Maurits Evers Oct 18 '18 at 10:52
  • Thanks Maurits, I reckon your first answer is actually better – Fons MA Oct 18 '18 at 11:24
  • @FonsMA No worries:-) I think I prefer the second approach, as it can be written in a single `mutate` line with a `list` defining the old/new `factor` levels. This seems slightly more generalisable and robust, because we're not explicitly hard-coding `factor` levels twice (once within `fct_collapse` and once within `factor`). Anyway, perhaps there will be a better answer. – Maurits Evers Oct 18 '18 at 12:05
  • I agree nothing looks particularly nice... :D For your second solution, it may again be my ignorance, not sure I know what the `unique(map_chr(lvls,1))` is doing there.. so let's see if people in other regions know what they're doing. I'm knocking off! – Fons MA Oct 18 '18 at 13:04
  • @FonsMA For what it's worth: `unique(map_chr(lvls, 1)` sets the `factor` levels to the unique `character` entries of the vector of `list` entries. – Maurits Evers Oct 18 '18 at 13:13