3

I am building a simple GLM model as follows:

model1 = glm(y ~ x1 + x2 + x3, data=train)

And I use predict function to score new prediction

newpred = predict(object=model1, newdata= validation, type = 'term')

By specifying the option type = 'term' I was hoping to get the the individual term predictions (i.e., beta1 * x1, beta2 * x2 etc). However, it turned out the type = 'term' option would return 'Centerized' prediction that centers the column values at 0 (as explained here: What does predict.glm(, type="terms") actually do?)

My question is if there is a simple way to get the plain vanilla term prediction rather than the centerized term predictions. The model has categorical variables, I want a single term for each categorical variables (same as the output of the type = 'term' option) rather than a series of dummy indicator variables.

PingPong
  • 355
  • 2
  • 11

1 Answers1

1

If your model is really that simple (e.g., only simple continuous predictor variables) then I feel like

X <- model.matrix(formula(model), data=train)
sweep(X, coef(model), MARGIN=2, FUN="*")

should work (I haven't tested); a lot of the complex internal machinery of predict(.,"terms") is for collecting columns that belong to the same "term" (e.g. a set of polynomial or spline coefficients).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • That's exactly my struggle. The model is not that simple. I have many categorical variables that are represented as a series of indicator variables in the design matrix. And that's exactly what I am trying to achieve: I want to have a single term for each predictor (rather than n-1 dummy indictors). I will clarify in my original post. – PingPong Sep 27 '20 at 18:30
  • OK, you'll have to reinvent a subset of the `predict` function. I can help, but it will be a lot easier if you can show a reproducible example (as long as it has at least two different factor predictors with >2 levels, that should be sufficient to show the idea) – Ben Bolker Sep 27 '20 at 18:40
  • Thank you for your kind offer. I found a work-around solution to my problem. It remain confusing that the predict() function would compute centerized term estimates (and not documented) rather the uncenterized estimates – PingPong Sep 28 '20 at 05:15