46

Is it possible to train a model by that has multiple continuous outputs (multi-regression)? What would be the objective of training such a model?

Thanks in advance for any suggestions

Mario
  • 1,631
  • 2
  • 21
  • 51
user1782011
  • 875
  • 1
  • 7
  • 13
  • If the output is more than one value, then you need a sequence model like RNN (GRU, LSTM etc.). [Keras](https://keras.io/) can help you quickly prototype such models. – uyaseen Sep 16 '16 at 21:33
  • I'm aware of RNN. I'm was wondering if such thing was also possible in Xgboost since I already know that boosting trees perform well for my problem space. I should also note that my output vecor size can be fixed. – user1782011 Sep 17 '16 at 07:19
  • If the relations between the outputs are known, you should be able to implement an objective function taking advantage of that. It has been done for [random forest with linear relation](https://cran.r-project.org/web/packages/MultivariateRandomForest/MultivariateRandomForest.pdf). And the XGBoost author thinks [it is doable](https://github.com/dmlc/xgboost/issues/680). – Adrien Renaud Sep 17 '16 at 09:55
  • 4
    @uyaseen this is not true, that is only when there is a variable number of outputs (and that is not even neccesarily true). You can have multiple outputs and calculate a summed loss over them – Jan van der Vegt Jan 19 '17 at 08:27

6 Answers6

66

My suggestion is to use sklearn.multioutput.MultiOutputRegressor as a wrapper of xgb.XGBRegressor. MultiOutputRegressor trains one regressor per target and only requires that the regressor implements fit and predict, which happens to support.

# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))

# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:linear')).fit(X, y)

# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))  # 0.004, 0.003, 0.005

This is probably the easiest way to regress multi-dimension targets using as you would not need to change any other part of your code (if you were using the API originally).

However, this method does not leverage any possible relation between targets. But you can try to design a customized objective function to achieve that.

Mario
  • 1,631
  • 2
  • 21
  • 51
ComeOnGetMe
  • 1,029
  • 9
  • 11
  • What if some x's have 1 target and some 50? – Eran Oct 24 '18 at 21:14
  • @Eran what do you mean by 'some 50'? – ComeOnGetMe Dec 20 '18 at 19:57
  • I mean that the number of outputs for each sample is not fixed. – Eran Dec 28 '18 at 16:31
  • 3
    Can you give me an example? A regression problem should have fixed shape of input and output. Otherwise it's not one single problem and should be treated separately. – ComeOnGetMe Feb 27 '19 at 07:45
  • For example you are asked to predict location/s (say in an array of locations) of different particles, and it is sometimes valid that some particle have more than one location (say up to 6 valid locations). – Eran Feb 28 '19 at 07:48
  • Perhaps you might need to a technique such as "feature hashing" (the hashing trick) – Santino Mar 01 '19 at 22:14
  • 4
    This isn't a model with two outputs; It's a wrapper around two models. – Itamar Mushkin Mar 09 '21 at 09:21
  • MultiOutputRegressor doesn't update validation dataset – mirik Jul 22 '21 at 21:15
  • To whom it might be relevant: In the paper "Do we really need deep learning for time series forecasting?" the authors used sklearn's MultiOutputRegressor to perform multi-step time series forecasting (e.g. predicting the next 24 hours of values for some variable). The authors claim to beat many neural network based approaches with this approach. [Paper](https://arxiv.org/abs/2101.02118) & [code](https://github.com/Daniela-Shereen/GBRT-for-TSF) – KasperGL Aug 30 '22 at 14:01
11

Multiple output regression is now available in the nightly build of XGBoost, and will be included in XGBoost 1.6.0.

See https://github.com/dmlc/xgboost/blob/master/demo/guide-python/multioutput_regression.py for an example.

Jesse Anderson
  • 4,507
  • 26
  • 36
6

It generates warnings:

reg:linear is now deprecated in favor of reg:squarederror

, so I updated an answer based on @ComeOnGetMe's

import numpy as np 
import pandas as pd 
import xgboost as xgb
from sklearn.multioutput import MultiOutputRegressor

# get some noised linear data
X = np.random.random((1000, 10))
a = np.random.random((10, 3))
y = np.dot(X, a) + np.random.normal(0, 1e-3, (1000, 3))

# fitting
multioutputregressor = MultiOutputRegressor(xgb.XGBRegressor(objective='reg:squarederror')).fit(X, y)

# predicting
print(np.mean((multioutputregressor.predict(X) - y)**2, axis=0))

Out:

[2.00592697e-05 1.50084441e-05 2.01412247e-05]
Mario
  • 1,631
  • 2
  • 21
  • 51
ah bon
  • 9,293
  • 12
  • 65
  • 148
3

I would place a comment but I lack the reputation. In addition to @Jesse Anderson, to install the most recent version, select the top link from here: https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=master/

Make sure to select the one for your operating system.

Use pip install to install the wheel. I.e. for macOS:

pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/master/xgboost-1.6.0.dev0%2B4d81c741e91c7660648f02d77b61ede33cef8c8d-py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl

Sijmen
  • 69
  • 1
  • 1
  • 10
1

Based on the above discussion, I have extended the univariate XGBoostLSS to a multivariate framework called Multi-Target XGBoostLSS Regression that models multiple targets and their dependencies in a probabilistic regression setting. Code follows soon.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 21 '22 at 12:29
0

You can use Linear regression, random forest regressors, and some other related algorithms in scikit-learn to produce multi-output regression. Not sure about XGboost. The boosting regressor in Scikit does not allow multiple outputs. For people who asked, when it may be necessary one example would be to forecast multi-steps of time-series a head.

double-beep
  • 5,031
  • 17
  • 33
  • 41
Schrewd
  • 1
  • 1