Comparing GLM models using predict

Question

Suppose I have two models created by calling glm() on the same data but with different formulas and/or families. Now I want to compare which model is better by predicting on an unknown data. Something like this:

mod1 <- glm(formula1, family1, data)
mod2 <- glm(formula2, family2, data)
mu1 <- predict(mod1, newdata, type = "response")
mu2 <- predict(mod2, newdata, type = "response")

How can I tell which of the predictions mu1 or mu2 is better?
Is there some simple command to compute the log likelihood of a prediction?

question 1 is better suited for http://stats.stackexchange.com/ — JPC, May 15 '14 at 19:06
The `summary` function returns the deviance which is -2*logLikelihood for the data and model. You will need to say what you mean by "log-likelihood of a prediction", since the predicted values would be expected be exactly in line with the model, i.e. the LL would be 0. — IRTFM, May 15 '14 at 19:13

score 3 · Accepted Answer · edited May 23 '17 at 12:25

It would be easier to answer this with a reproducible example.

It often makes more sense to choose a family a priori rather than according too goodness of fit -- for example, if you have count (non-negative integer) responses with no obvious upper bound, your only real choice that lies strictly within the exponential family is Poisson.

set.seed(101)
x <- runif(1000)
mu <- exp(1+2*x)
y <- rgamma(1000,shape=3,scale=mu/3)
d <- data.frame(x,y)

New data:

nd <- data.frame(x=runif(100))
nd$y <- rgamma(100,shape=3,scale=exp(1+2*nd$x)/3)

Fit Gamma and Gaussian:

mod1 <- glm(y~x,family=Gamma(link="log"),data=d)
mod2 <- glm(y~x,family=gaussian(link="log"),data=d)

Predictions:

mu1 <- predict(mod1, newdata=nd, type="response")
mu2 <- predict(mod2, newdata=nd, type="response")

Extract shape/scale parameters:

sigma <- sqrt(summary(mod2)$dispersion)
shape <- MASS::gamma.shape(mod1)$alpha

Root mean squared error:

rmse <- function(x1,x2) sqrt(mean((x1-x2)^2))
rmse(mu1,nd$y)  ## 5.845
rmse(mu2,nd$y)  ## 5.842

Negative log likelihoods:

-sum(dgamma(nd$y,shape=shape,scale=mu1/shape,log=TRUE))  ## 276.84
-sum(dnorm(nd$y,mean=mu2,sd=sigma,log=TRUE))  ## 318.4

Comparing GLM models using predict

1 Answers1