2

I am trying to use parsnip to specify a recipe to fit an xgboost poisson regression model with a log offset. To set-up a poisson regression I can specify an option in set_engine, which works nicely:

# Specify recipe
my_recipe <- recipe(training_df, Count ~.) %>%
      # Remove covariates that are 80% correlated
      step_corr(all_predictors(), threshold = 0.8) %>%   
      step_center(all_predictors(), -all_outcomes()) %>%
      step_scale(all_predictors(), -all_outcomes()))) 
                    
# Specify xgboost config
tune_spec <- boost_tree(
  trees = 100) %>%
  set_engine("xgboost", objective='count:poisson') %>%
  set_mode("regression") %>%
  translate()

Looking at the documentation for xgboost and this example here it seems that the following approach is recommended for specifying an offset:

setinfo(xgtrain, "base_margin", log(training_df$my_offset))

I'm not sure how to include this into set_engine above. Specifically, I'm not sure how to relate xgtrain to the dataframe training_df.

Anthony W
  • 1,289
  • 2
  • 15
  • 28
  • 1
    With the way the data is transformed on its way through parsnip to xgboost, unfortunately it's not very directly easy to use `set_info()` on a dataset like that. We don't currently support Poisson regression very directly for xgboost in tidymodels. If you would like to [open an issue](https://github.com/tidymodels/parsnip/issues), we can track interest in supporting this. – Julia Silge Jul 21 '20 at 03:43

1 Answers1

0

The question is somewhat old, but since weights have come to tidymodels recently, I would like to present a way doing poisson regression on rate data via xgboost should be possible with parsnip now.

According to this blog post, because of how xgboost works, setting the log offset and predicting the counts is equivalent to using weights and predicting the rates. So, you should be able to perform the analysis via parsnip by setting importance weights (see the announcement post linked above) and choose the rates, not the counts, as target variable in the formula.

M.Doerner
  • 712
  • 3
  • 7