5

I am modelling a claims frequency (poisson distr) in R. I am using the gbm and xgboost packages, but it seems that xgboost does not have an offset parameter to take the exposure into account?

In a gbm, one would take the exposure into account as follows:

gbm.fit(x = train,y = target, n.trees = 100,distribution = "poisson", offset = log(exposure))

How do I achieve the same with `xgboost?

PS: I cannot use the exposure as predictor since a new obs is created each time a claim is observed.

TheLittleSun
  • 51
  • 1
  • 2
  • If you post some data, I'll take a crack at it. (but after some sleep.) There are two different formulations of Poisson regression that can be used in glm and only one of them requires an offset. Perhaps that will also work in gbm and/or xboost. – IRTFM Jan 20 '16 at 09:59

2 Answers2

4

Once you have created your xgboost matrix you can set an offset using setinfo and the base_margin attribute, eg:

setinfo(xgtrain, "base_margin", log(d$exposure))

You can see the full example from the similar question I asked here: XGBoost - Poisson distribution with varying exposure / offset

Community
  • 1
  • 1
Pete Lowth
  • 171
  • 1
  • 7
0

Normalize your count by exposure and use exposure as weight. See this answer for further details.

Community
  • 1
  • 1
Vinh Nguyen
  • 1,014
  • 1
  • 9
  • 19