multioutput regression by xgboost

Question

Is it possible to train a model by xgboost that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model?

Thanks in advance for any suggestions

If the output is more than one value, then you need a sequence model like RNN (GRU, LSTM etc.). [Keras](https://keras.io/) can help you quickly prototype such models. — uyaseen, Sep 16 '16 at 21:33
I'm aware of RNN. I'm was wondering if such thing was also possible in Xgboost since I already know that boosting trees perform well for my problem space. I should also note that my output vecor size can be fixed. — user1782011, Sep 17 '16 at 07:19
If the relations between the outputs are known, you should be able to implement an objective function taking advantage of that. It has been done for [random forest with linear relation](https://cran.r-project.org/web/packages/MultivariateRandomForest/MultivariateRandomForest.pdf). And the XGBoost author thinks [it is doable](https://github.com/dmlc/xgboost/issues/680). — Adrien Renaud, Sep 17 '16 at 09:55
@uyaseen this is not true, that is only when there is a variable number of outputs (and that is not even neccesarily true). You can have multiple outputs and calculate a summed loss over them — Jan van der Vegt, Jan 19 '17 at 08:27

score 66 · Answer 1 · edited May 01 '23 at 03:31

66

My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor. MultiOutputRegressor trains one regressor per target and only requires that the regressor implements fit and predict, which xgboost happens to support.

# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))

# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:linear')).fit(X, y)

# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))  # 0.004, 0.003, 0.005

This is probably the easiest way to regress multi-dimension targets using xgboost as you would not need to change any other part of your code (if you were using the sklearn API originally).

However, this method does not leverage any possible relation between targets. But you can try to design a customized objective function to achieve that.

edited May 01 '23 at 03:31

Mario

1,631
2
21
51

answered Dec 07 '17 at 00:29

ComeOnGetMe

1,029
9
11

What if some x's have 1 target and some 50? – Eran Oct 24 '18 at 21:14
@Eran what do you mean by 'some 50'? – ComeOnGetMe Dec 20 '18 at 19:57
I mean that the number of outputs for each sample is not fixed. – Eran Dec 28 '18 at 16:31
3

Can you give me an example? A regression problem should have fixed shape of input and output. Otherwise it's not one single problem and should be treated separately. – ComeOnGetMe Feb 27 '19 at 07:45
For example you are asked to predict location/s (say in an array of locations) of different particles, and it is sometimes valid that some particle have more than one location (say up to 6 valid locations). – Eran Feb 28 '19 at 07:48
Perhaps you might need to a technique such as "feature hashing" (the hashing trick) – Santino Mar 01 '19 at 22:14
4

This isn't a model with two outputs; It's a wrapper around two models. – Itamar Mushkin Mar 09 '21 at 09:21
MultiOutputRegressor doesn't update validation dataset – mirik Jul 22 '21 at 21:15
To whom it might be relevant: In the paper "Do we really need deep learning for time series forecasting?" the authors used sklearn's MultiOutputRegressor to perform multi-step time series forecasting (e.g. predicting the next 24 hours of values for some variable). The authors claim to beat many neural network based approaches with this approach. [Paper](https://arxiv.org/abs/2101.02118) & [code](https://github.com/Daniela-Shereen/GBRT-for-TSF) – KasperGL Aug 30 '22 at 14:01

score 11 · Answer 2 · answered Feb 26 '22 at 22:38

11

Multiple output regression is now available in the nightly build of XGBoost, and will be included in XGBoost 1.6.0.

See https://github.com/dmlc/xgboost/blob/master/demo/guide-python/multioutput_regression.py for an example.

answered Feb 26 '22 at 22:38

Jesse Anderson

4,507
26
36

1

This is great news! Thank you for sharing! – Adib Mar 01 '22 at 10:42
Any difference when I fit each model for each y target independently? – olivia Apr 29 '22 at 02:08
3

Apparently, this library implementation still builds separate models for each output rather than a shared one: https://github.com/dmlc/xgboost/blob/master/doc/tutorials/multioutput.rst – Marcin Wojnarski May 20 '22 at 09:49

score 6 · Answer 3 · edited May 01 '23 at 19:24

It generates warnings:

reg:linear is now deprecated in favor of reg:squarederror

, so I updated an answer based on @ComeOnGetMe's

import numpy as np 
import pandas as pd 
import xgboost as xgb
from sklearn.multioutput import MultiOutputRegressor

# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))

# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:squarederror')).fit(X, y)

# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))

Out:

[2.00592697e-05 1.50084441e-05 2.01412247e-05]

score 3 · Answer 4 · answered Mar 22 '22 at 09:39

I would place a comment but I lack the reputation. In addition to @Jesse Anderson, to install the most recent version, select the top link from here: https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=master/

Make sure to select the one for your operating system.

Use pip install to install the wheel. I.e. for macOS:

pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/master/xgboost-1.6.0.dev0%2B4d81c741e91c7660648f02d77b61ede33cef8c8d-py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl

StatMixedML · Answer 5 · 2022-10-14T13:25:10.527

1

Based on the above discussion, I have extended the univariate XGBoostLSS to a multivariate framework called Multi-Target XGBoostLSS Regression that models multiple targets and their dependencies in a probabilistic regression setting. Code follows soon.

edited Oct 14 '22 at 13:25

answered Oct 14 '22 at 13:22

StatMixedML

21
3

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 21 '22 at 12:29

score 0 · Answer 6 · edited May 05 '23 at 11:59

0

You can use Linear regression, random forest regressors, and some other related algorithms in scikit-learn to produce multi-output regression. Not sure about XGboost. The boosting regressor in Scikit does not allow multiple outputs. For people who asked, when it may be necessary one example would be to forecast multi-steps of time-series a head.

edited May 05 '23 at 11:59

double-beep

5,031
17
33
41

answered Nov 15 '21 at 13:05

Schrewd

1
1

multioutput regression by xgboost

6 Answers6

Linked