I have example data as follows:
df <- data.frame(Q1_A = c("This is a reason", NA, "This is a reason", NA),
Q1_B = c("This is another reason", "This is another reason", NA, NA))
Each answer had multiple answer possibilities. As a result, they had to be split out.NA
s are therefore also not real NA
s
I would like to run a regression in the form:
lm( y ~ Q1_A + Q1_B + ... + )
Which then shows as the output:
Coefficients:
(Intercept) Q1_A Q1_B
34.66099 -0.02058 -1.58728
I guess this means I need to turn all the NA values to base levels.
What is the best way to turn these variables into dummies?
Desired output:
df <- data.frame(Q1_A = c("This is a reason", "Baselevel", "This is a reason", "Baselevel"),
Q1_B = c("This is another reason", "This is another reason", "Baselevel", "Baselevel"))