Recoding character variables within the pipe operator

Question

I'm trying to build a logistic regression model from a survey data set. I'm interested in looking at the influence of incentive type (e.g., giftcard) and grade level of student (freshman, sophomore, etc.) to predict whether s/he completed the survey. The data frame has hundreds of variables, so my first step is to only use what I need, using the pipe operator in tidyverse to:

1) Select the four variables of interest: If the student finished the survey (FINISHED), the campus location (CAMPUS), incentive type (INCENTIVE), and grade level of each student (LEVEL).

2) Filter only responses from one campus of interest ("smith") and filter to only look at three incentive types since "other" isn't very meaningful in this case.

I try running the model, but it will not work until I I recode the character strings into numeric variables (0, 1, 2...) and specify that they are factors. I've read extensively in other forums that you can use "as.factor" and "recode" for each variable. But it seems cumbersome to do so for each variable, assign to a new variable, and do the same to set as.factor.

Am I able to recode the character strings within the piping operator as numeric variables (e.g., freshman = 0, sophomore = 1, junior = 2, etc.) and then set as factors using as.factor()? I attempted doing it within the piping operator, but I receive an error message in return. Or does one need to do these operations before filtering?

Could anyone offer any pointers? Below is the code I am using:

survey <- read.csv("SURVEY2017.csv")

survey1 <- survey %>% 
  select(FINISHED, CAMPUS, INCENTIVE, LEVEL) %>%
  filter(CAMPUS == "smith") %>%
  filter(INCENTIVE %in% c("A chance to win one of ten $100 Visa     
  gift cards", 
  "A chance to win one of three $500 Visa gift cards",
  "I wanted my opinions to be heard by faculty, staff, and    
  the administration"))

model <- glm(FINISHED ~ INCENTIVE + LEVEL, family = "binomial", 
data = survey1)

Thank you!

Please include reproducible sample data. That will make it much easier for others to provide targeted help; see also how to provide a [minimal reproducible example/attempt](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Maurits Evers, Nov 30 '18 at 02:10
can you clarify why you need to recode the `LEVEL` variable as numeric? to me `... %>% mutate(LEVEL=factor(LEVEL,level=c("freshman","sophomore",...)))` would seem more natural (you could make it an ordered factor, or set successive-differences contrasts, depending on how you want to structure the model) — Ben Bolker, Nov 30 '18 at 02:42
@BenBolker Thanks, Ben. I thought it would be best that I recode my variables as 0, 1, 2, etc. I guess that is not necessary. I'll use the mutate function like you mentioned. — bjk127, Nov 30 '18 at 13:13

score 1 · Answer 1 · answered Nov 30 '18 at 02:33

First of all, it is usually a good idea to provide a minimal working example (mwe) for your questions and this would include a toy dataset.

Based on your question, you have to recode the variable into numeric first and then assign as factors. There are many ways to do this with dplyr but I really like dplyr::case_when() when there are more than 2 categories to recode. I then wrap it with factor() and specify the levels and labels.

library(magrittr)
library(dplyr)

data <- data.frame(FINISHED = sample(c('Y', 'N'), 1000, replace = T), 
                   CAMPUS = sample(c("smith", "campus A", "campus B"), 1000, replace = T), 
                   INCENTIVE = sample(c("Gift cards", "Heard by faculty"), 1000, replace = T), 
                   LEVEL = sample(c("freshman", "sophomore", "junior"), 1000, replace =T), 
                   stringsAsFactors = F)

data <- data %>% 
  mutate(LEVEL = factor(dplyr::case_when(
    LEVEL == "freshman" ~ 0,
    LEVEL == "sophomore" ~ 1, 
    LEVEL == "junior" ~ 2
  ), levels = c(0:2), labels = c('freshman', "sophomore", "junior")))

data structure:

> str(data)
'data.frame':   1000 obs. of  4 variables:
 $ FINISHED : chr  "Y" "N" "Y" "N" ...
 $ CAMPUS   : chr  "campus B" "campus A" "smith" "campus B" ...
 $ INCENTIVE: chr  "Gift cards" "Heard by faculty" "Gift cards" "Gift cards" ...
 $ LEVEL    : Factor w/ 3 levels "freshman","sophomore",..: 3 3 2 2 2 1 1 2 2 2 ...

Recoding character variables within the pipe operator

1 Answers1