6

I'm looking for a way to specify the value of a predictor variable. When I run a glm with my current data, the coefficient for one of my variables is close to one. I'd like to set it at .8.

I know this will give me a lower R^2 value, but I know a priori that the predictive power of the model will be greater.

The weights component of glm looks promising, but I haven't figured it out yet.

Any help would be greatly appreciated.

Burton Guster
  • 2,213
  • 8
  • 31
  • 29

1 Answers1

9

I believe you are looking for the offset argument in glm. So for example, you might do something like this:

glm(y ~ x1, offset = x2,...)

where in this case the coefficient of x2 would be set at 1. In your case, you may perhaps want to multiply that column by 0.8?

To expand, here is what ?glm says about the offset argument:

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

So you can add offsets in the model formula itself using the offset() function, as well. Here is a simple example illustrating its use:

set.seed(123)

d <- data.frame(y = factor(sample(0:1,size = 100,replace = TRUE)),x1 = runif(100),x2 = runif(100))

glm1 <- glm(y~x1+x2,data = d,family = binomial)
coef(glm1)

(Intercept)          x1          x2 
  0.4307718  -0.4128541  -0.6994810 

glm2 <- glm(y~x1,data = d,offset = x2,family = binomial)
coef(glm2)

(Intercept)          x1 
 -0.4963699  -0.2185571 

glm3 <- glm(y~x1+offset(x2),data = d,family = binomial)
coef(glm3)

(Intercept)          x1 
 -0.4963699  -0.2185571 

Note that the last two have the same coefficients.

joran
  • 169,992
  • 32
  • 429
  • 468
  • Ya, you're right. I need to use offset. I'm having some trouble actually using it, though. When I do glm(y~x1...., and have offset = x2*.8) my new coefficient is around .6, which doesn't seem to make sense. Also, when I plot the predict function from the new glm variable, the line is exactly the same as when I had no offset argument. – Burton Guster Nov 22 '11 at 23:23
  • Also, in the help section for glm. I see an example where they use offset. But they just have the variable to be offset in the offset function, but I can't see where they specify a number of how much to offset that variable by... – Burton Guster Nov 22 '11 at 23:24
  • @BurtonGuster Without the ability to actually sit at your computer and see what exactly what your data is and the models you're fitting, it's impossible for me to know what would "make sense" or not. If the difference between the two models is small, the visual difference in fitted lines may be difficult to see with the naked eye. – joran Nov 22 '11 at 23:31
  • Hahaha, fair enough. I'll try to spell it out more clearly. The current coefficient for the variable in question is 1.30. When I do offset = x2*.3 the resultant coefficient is 1.1. When I multiply x2 by .8, the coefficient is .6. Maybe it's taking off 80% of the original, but that doesn't work mathematically. – Burton Guster Nov 22 '11 at 23:36
  • The other issue is that when I run the regression result through predict. The predictions are exactly the same regardless of what I do with offset, so I clearly haven't done something right. I'll see what I can find on it. – Burton Guster Nov 22 '11 at 23:37
  • @BurtonGuster If you run `all.equal(predict(glm1),predict(glm2))` on my example, you'll see that you do get different fitted values. Again, it's impossible for me to explain what you're seeing without more information than is suitable to throw around in the comments. Work on it some more and if you get stuck again, ask a separate question. – joran Nov 22 '11 at 23:41
  • 5
    @BurtonGuster: the other thing to keep in mind if you are using a `family` other than `gaussian` (i.e. you are really running a *generalized* linear model rather than a *general* one, which is what `glm` is generally intended for ...) is that the offset is applied on the scale of the linear predictor, so you might have to use (e.g. if you are using a log link) `offset(log(0.8*x))` rather than `offset(0.8*x)`. But I can't tell from the detail you have provided ... – Ben Bolker Nov 23 '11 at 01:25
  • Thanks for the help guys. What turned out to fix my problem was to use offset in the prediction model. So instead of (y~x1+x2, offset=x2) I used (y~x1+offset(x2)). Thanks again. – Burton Guster Nov 23 '11 at 17:19