Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model?
Thanks in advance for any suggestions
Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model?
Thanks in advance for any suggestions
My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor
. MultiOutputRegressor
trains one regressor per target and only requires that the regressor implements fit
and predict
, which xgboost happens to support.
# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))
# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:linear')).fit(X, y)
# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0)) # 0.004, 0.003, 0.005
This is probably the easiest way to regress multi-dimension targets using xgboost as you would not need to change any other part of your code (if you were using the sklearn API originally).
However, this method does not leverage any possible relation between targets. But you can try to design a customized objective function to achieve that.
Multiple output regression is now available in the nightly build of XGBoost, and will be included in XGBoost 1.6.0.
See https://github.com/dmlc/xgboost/blob/master/demo/guide-python/multioutput_regression.py for an example.
It generates warnings:
reg:linear is now deprecated in favor of reg:squarederror
, so I updated an answer based on @ComeOnGetMe's
import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.multioutput import MultiOutputRegressor
# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))
# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:squarederror')).fit(X, y)
# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))
Out:
[2.00592697e-05 1.50084441e-05 2.01412247e-05]
I would place a comment but I lack the reputation. In addition to @Jesse Anderson, to install the most recent version, select the top link from here: https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=master/
Make sure to select the one for your operating system.
Use pip install to install the wheel. I.e. for macOS:
pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/master/xgboost-1.6.0.dev0%2B4d81c741e91c7660648f02d77b61ede33cef8c8d-py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl
Based on the above discussion, I have extended the univariate XGBoostLSS to a multivariate framework called Multi-Target XGBoostLSS Regression that models multiple targets and their dependencies in a probabilistic regression setting. Code follows soon.
You can use Linear regression, random forest regressors, and some other related algorithms in scikit-learn to produce multi-output regression. Not sure about XGboost. The boosting regressor in Scikit does not allow multiple outputs. For people who asked, when it may be necessary one example would be to forecast multi-steps of time-series a head.