0

I am running a logistic regression and I am noticing that each unique character string in my vector is receiving its own parameter. Is R optimizing the prediction on the outcome variable based each collection of unique values within the vector?

  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Since we have no idea what your data looks like it's hard to say for sure what's going on. But it sounds like you are trying to include a categorical variable as a covariate in your model. How exactly did you want such a variable to be coded in your model? – MrFlick Jun 12 '18 at 19:12

1 Answers1

0

sorry. A little new to stack overflow.

library(stats)
df = as.data.frame( matrix(c("a","a","b","c","c","b","a","a","b","b","c",1,0,0,0,1,0,1,1,0,1,0,1,0,100,10,8,3,5,6,13,10,4,"SF","CHI","NY","NY","SF","SF","CHI","CHI","SF","CHI","NY"), ncol = 4))
colnames(df) = c("letter","number1","number2","city")
df$letter = as.factor(df$letter)
df$city = as.factor(df$city)
df$number1 = as.numeric(df$number1)
df$number2 = as.numeric(df$number2)

glm(number1 ~ .,data=df)

#Call:  glm(formula = number1 ~ ., data = df)

#Coefficients:
#  (Intercept)      letterb      letterc      number2       cityNY       citySF  
#1.57191     -0.25227     -0.01424      0.04593     -0.69269     -0.20634  

#Degrees of Freedom: 10 Total (i.e. Null);  5 Residual
#Null Deviance:     2.727 
#Residual Deviance: 1.35    AIC: 22.14

How is the logit treating city?