I am running a logistic regression and I am noticing that each unique character string in my vector is receiving its own parameter. Is R optimizing the prediction on the outcome variable based each collection of unique values within the vector?
Asked
Active
Viewed 132 times
0
-
When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Since we have no idea what your data looks like it's hard to say for sure what's going on. But it sounds like you are trying to include a categorical variable as a covariate in your model. How exactly did you want such a variable to be coded in your model? – MrFlick Jun 12 '18 at 19:12
1 Answers
0
sorry. A little new to stack overflow.
library(stats)
df = as.data.frame( matrix(c("a","a","b","c","c","b","a","a","b","b","c",1,0,0,0,1,0,1,1,0,1,0,1,0,100,10,8,3,5,6,13,10,4,"SF","CHI","NY","NY","SF","SF","CHI","CHI","SF","CHI","NY"), ncol = 4))
colnames(df) = c("letter","number1","number2","city")
df$letter = as.factor(df$letter)
df$city = as.factor(df$city)
df$number1 = as.numeric(df$number1)
df$number2 = as.numeric(df$number2)
glm(number1 ~ .,data=df)
#Call: glm(formula = number1 ~ ., data = df)
#Coefficients:
# (Intercept) letterb letterc number2 cityNY citySF
#1.57191 -0.25227 -0.01424 0.04593 -0.69269 -0.20634
#Degrees of Freedom: 10 Total (i.e. Null); 5 Residual
#Null Deviance: 2.727
#Residual Deviance: 1.35 AIC: 22.14
How is the logit treating city?