I would like to predict multiple dependent variables using multiple predictors. If I understood correctly, in principle one could make a bunch of linear regression models that each predict one dependent variable, but if the dependent variables are correlated, it makes more sense to use multivariate regression. I would like to do the latter, but I'm not sure how.
So far I haven't found a Python package that specifically supports this. I've tried scikit-learn, and even though their linear regression model example only shows the case where y is an array (one dependent variable per observation), it seems to be able to handle multiple y. But when I compare the output of this "multivariate" method to the results I get by manually looping over each dependent variable and predicting them independently from each other, the outcome is exactly the same. I don't think this should be the case, because there is a strong correlation between some of the dependent variables (>0.5).
The code just looks like this, with y
either a n x 1
matrix or n x m
matrix, and x
and newx
matrices of various sizes (number of rows in x == n
).
ols = linear_model.LinearRegression()
ols.fit(x,y)
ols.predict(newx)
Does this function actually perform multivariate regression?