I'm trying to build a logistic regression model from a survey data set. I'm interested in looking at the influence of incentive type (e.g., giftcard) and grade level of student (freshman, sophomore, etc.) to predict whether s/he completed the survey. The data frame has hundreds of variables, so my first step is to only use what I need, using the pipe operator in tidyverse to:
1) Select the four variables of interest: If the student finished the survey (FINISHED), the campus location (CAMPUS), incentive type (INCENTIVE), and grade level of each student (LEVEL).
2) Filter only responses from one campus of interest ("smith") and filter to only look at three incentive types since "other" isn't very meaningful in this case.
I try running the model, but it will not work until I I recode the character strings into numeric variables (0, 1, 2...) and specify that they are factors. I've read extensively in other forums that you can use "as.factor" and "recode" for each variable. But it seems cumbersome to do so for each variable, assign to a new variable, and do the same to set as.factor.
Am I able to recode the character strings within the piping operator as numeric variables (e.g., freshman = 0, sophomore = 1, junior = 2, etc.) and then set as factors using as.factor()? I attempted doing it within the piping operator, but I receive an error message in return. Or does one need to do these operations before filtering?
Could anyone offer any pointers? Below is the code I am using:
survey <- read.csv("SURVEY2017.csv")
survey1 <- survey %>%
select(FINISHED, CAMPUS, INCENTIVE, LEVEL) %>%
filter(CAMPUS == "smith") %>%
filter(INCENTIVE %in% c("A chance to win one of ten $100 Visa
gift cards",
"A chance to win one of three $500 Visa gift cards",
"I wanted my opinions to be heard by faculty, staff, and
the administration"))
model <- glm(FINISHED ~ INCENTIVE + LEVEL, family = "binomial",
data = survey1)
Thank you!