0

My predicted values are all negative. I would have expected 0's or 1's. Can anyone see where i am going wrong?

fold = 10
end = nrow(birthwt)
fold_2 = floor(end/fold)

df_i = birthwt[sample(nrow(birthwt)),] # random sort the dataframe birthwt

tester = df_i[1:fold_2,]  # remove first tenth of rows - USE PREDICT ON THIS DATA
trainer = df_i[-c(1:fold_2),]  # all other than the first tenth of rows - USE GLM ON THIS DATA

mod = glm(low~lwt,family=binomial,data=trainer)
ypred = predict(mod,data=tester) # predicted values
Neeku
  • 3,646
  • 8
  • 33
  • 43
miyagi
  • 11
  • When including code, make sure your example is [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Include sample input data so we can test possible solutions. This makes it much easier to help you. – MrFlick Nov 11 '14 at 22:52

1 Answers1

2

The default for predict.glm is to give you the value of the link (on the scale of the linear predictors) before transformation. If you want to predict the response, use

ypred <- predict(mod, data=tester, type="response") 

If may be helpful to read the ?predict.glm help file.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Well, unless you supply data, it's impossible to know if those values are reasonable. – MrFlick Nov 11 '14 at 23:55
  • Apologies Mr Flick, i'm using the birthwt data file from library(MASS) and doing a 10 fold CV to determine which of the variables predict the low variable. – miyagi Nov 12 '14 at 00:06
  • Those values seem reasonable. The "low" even doesn't occur that often. If you look at the proportion of low given lwt, you see: `barplot(with(birthwt, tapply(low, cut(lwt, breaks=5), mean)))` so really in none of those groups does low occur more than 50% of the time. So regressing on lwt alone won't get you very high probabilities. – MrFlick Nov 12 '14 at 04:46
  • Here is my code:library(MASS) tenfold3 = function() { fold = 10 end = nrow(birthwt) fold_2 = floor(end/fold) misclasrate=numeric() for(i in 1:10){ df_i = birthwt[sample(nrow(birthwt)),] # random sort the dataframe birthwt #some code removed to allow posting misclasrate[i] = 1-sum(val_df$misclas) / nrow(val_df) mean_mrate = signif(mean(misclasrate),4) return(mean_mrate) } } – miyagi Nov 14 '14 at 10:57