1

I found this helpful answer to almost the same question, but it doesn't quite do what I need.

I have respondents' age, a continuous variable, and I'd like to recode it to categorical using tidyverse. The link above includes explanations of the functions cut_number(), cut_interval(), and cut_width(), but the reason those don't work for me is because I'd like to recode into categories that I've already determined ahead of time, namely, the ranges 18-34, 35-54, and 55+. None of those cut functions allow me to do this (or at least I didn't see how).

I was able to get my code to run without tidyverse, using:

data$age[data$"Age(Self-report)"<35] <- "18-34"
data$age[data$"Age(Self-report)">34 & data$"Age(Self-report)"<55] <- "35-54"
data$age[data$"Age(Self-report)">55] <- "55+"

but I'm trying to be consistent in my coding style and would like to learn how to do this in Tidyverse. Thanks for any and all help!

David Buck
  • 3,752
  • 35
  • 31
  • 35
Chris J
  • 13
  • 4
  • Those ggplot `cut_*` functions are just convenience wrappers around the base `cut` function. Have you tried that, setting the breaks yourself? – camille Apr 18 '20 at 17:29

1 Answers1

1

A tidyverse approach would make use of dplyr::case_when to recode the variable like so:

data %>% 
  mutate(age = case_when(
    `Age(Self-report)` < 35 ~ "18-34",
    `Age(Self-report)` > 34 & `Age(Self-report)` < 55 ~ "35-54",
    `Age(Self-report)` > 55 ~ "55+"
  ))
stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank you so much! In my fervor of posting my first question I forgot to include the "case_when" code that I tried using! `data_2 <- data_2 %>%` `mutate(age = case_when(` `"Age(Self-report)" %in% seq(from="18", to="34") ~ "18-34",` `"Age(Self-report)" %in% seq(from="35", to="54") ~ "35-54",` `"Age(Self-report)" %in% seq(from="55", to="90") ~ "55+"))` I couldn't figure out what was wrong, though. Looks like I was making it more complex than it needed it be, as usual! Thanks again! – Chris J Apr 18 '20 at 17:18
  • Hi @ChrisJ. That will not work, because you put the varname in double quotes which means your are checking whether the charcter string "Age(Self-report)" is part of the interval defined by "seq". When using awkward varname like yours (;, you have use backticks like I did. BTW: Have a look at `dplyr::between` if you want to check whether a number is in an interval. Dont's use it that often but good to know. – stefan Apr 18 '20 at 17:27
  • @ChrisJ Don’t forget to tick the answer if it solved your problem. – Mark Neal Apr 18 '20 at 19:38
  • Hah yes I need to rename them, I think it's a carryover from our survey item names! But it's good to understand how the quotes work compared to the backticks, with bad variable names! :) Thanks again. – Chris J Apr 19 '20 at 00:37