I have a data frame of different variables in R that represent indicators such as race, SAT score, and high school GPA, dropout rate, and gender. I am trying to regress dropout rate using these as right hand side inputs. However, I am only trying to do this for black and hispanic students, coding black as "B" and hispanic as "H" under race.
newdata <- subset(x.20, race %in% c("B", "H"), select=c(race, individual.ind, institutional.ind, male, twohousehold, foreignbornparent, parentdegree, welfare, householdincome, schoolquality, SAT, privateschool, apcourses, socialdistance, peerinfluence, selfefficacy, selfesteem, hsgpa, droppedout))
mylogit <- glm(droppedout ~ race + individual.ind + institutional.ind + male + twohousehold + foreignbornparent + parentdegree + welfare + householdincome + SAT + privateschool + apcourses + hsgpa + schoolquality + socialdistance + peerinfluence + selfefficacy + selfesteem, family = binomial, data=newdata)
stargazer(mylogit, title="Title: Logit Regression Results", type = "latex", single.row = TRUE, header=FALSE, column.sep.width = "1pt",
digits = 1, covariate.labels=c("Race"))
The above code gives me a regression table in Stargazer but the regression coefficients are much different than the ones recorded in the data I am replication. Does anyone have any idea what is going wrong? Am I effectively subsetting all of the data as black and hispanic correctly?