I have done regression analysis in R many times but I am not able to get a hang of what is happening in this model.
I have income and demographic data for 500 people. I am trying to understand the impact of gender on following covid protocols and controlling for age, income and education. My dependent variable (e.g. mask wearing) is a factor (0 representing no mask, and 1 representing mask worn). Age is a numeric variable between 18 to 35, gender is a character variable (M & F), income has levels from 0 to 5 and education is also coded from 0 to 5 to represent different education levels.
Here is a reproducible example:
pand_data <- data.frame(
Age = sample(25:30),
Edu = sample(0:5),
mask = sample((0:1), 6, replace = TRUE),
gender = sample(c("m", "f"), 6, replace = TRUE),
income = sample((1:5), 6, replace = TRUE))
glm(mask ~ gender + Age + income + Edu, data = pand_data, family = "binomial")
The output shows the intercept and then instead of showing the coefficient for Age as a variable, it shows Age18, Age19... Age35 as separate variables. Same is the case for income (income0, income1,...income5) and education levels. I converted the variables to factors and ran the same code but it didn't work either. My end goal is to calculate odds ratio and I have used package epiR previously, but that doesn't work with this either.
I have never faced this before and I have tried to tweak many things in this code, including changing it to a lm model but I think I am missing something, so here I am. Apologies for the long post!