5

I want to train a linear model Y = M_1*X_1 + M_2*X_2 using sklearn with multidimensional input and output samples (e.g. vectors). I tried the following code:

from sklearn import linear_model
from pandas import DataFrame 

x1 = [[1,2],[2,3],[3,4]]
x2 = [[1,1],[3,2],[3,5]]
y = [[1,0],[1,2],[2,3]]
model = {
    'vec1': x1,
    'vec2': x2,
    'compound_vec': y}

df = DataFrame(model, columns=['vec1','vec2','compound_vec'])
x = df[['vec1','vec2']].astype(object)
y = df['compound_vec'].astype(object)
regr = linear_model.LinearRegression()
regr.fit(x,y)

But I get the following error:

regr.fit(x,y)
 ...
array = array.astype(np.float64)
ValueError: setting an array element with a sequence.

Does anyone know what is wrong with the code? and if this is a right way to train Y = M_1*X_1 + M_2*X_2?

Mila
  • 285
  • 4
  • 13
  • Is your goal, in the end, to also learn and predict multiple output values at once, as your first sentence may still suggest (so is Y multidimensional in the formula)? Or is it only reformatting the data (as done in the accepted answer)? – Marcus V. Aug 24 '18 at 12:08
  • @MarcusV. I need to train the model so that given two multidimensional inputs like vectors, it predicts the output in the same space (vector), so `M_1` and `M_2` are in the matrix space. In case of having one independent variable it goes well, but I am confused by having two independent variables. – Mila Aug 24 '18 at 12:15
  • @Shimil: There is nothing to get confused here. In `Y = M_1*X_1 + M_2*X_2`, for a given value of `X_1` and a given value of `X_2`, you will have a corresponding `Y` value. So if you have 6 pairs of `X_1` and `X_2` values as you have in your data, you will have 6 output values of `Y` – Sheldore Aug 24 '18 at 13:07
  • @Bazingaa it maybe still be that Shimil wants to actually have multiple outputs/dependent variables, but then linear regression won't work out of the box. It may work using the [MultiOutputRegressor](sklearn.multioutput.MultiOutputRegressor) wrapper, with the assumption that both y can be predicted independently (as it fits one model per output). – Marcus V. Aug 24 '18 at 13:52
  • Hmm you are right. It could be. I just took the equation Shimil provided and tried to find why the code was complaning. – Sheldore Aug 24 '18 at 13:54
  • @MarcusV. sorry, by multiple outputs, do you mean the vector representation of elements in `y` (e.g. `y_0 = [1,0]` in my example)? or `y` itself which consists of 3 elements (`y=[y_0,y_1,y_2]`)? – Mila Aug 24 '18 at 14:45
  • I meant the latter I suppose (as in [this](http://oa.upm.es/40804/1/INVE_MEM_2015_204213.pdf) paper). Sorry, maybe I just didn't get your question, hence my questions. The title "multi-variate" was maybe missleading me. So if the answer is what you wanted, we can leave it with that. – Marcus V. Aug 25 '18 at 07:31

1 Answers1

3

Just flatten your x1, x2 and y lists and you are good to go. One way to do that is using arrays as follows:

import numpy as np
x1 =np.array(x1).flatten()
x2 =np.array(x2).flatten()
y =np.array(y).flatten()

Second way to do it is using ravel as:

x1 =np.array(x1).ravel()
x2 =np.array(x2).ravel()
y =np.array(y).ravel()

Third way without using NumPy is by using list comprehension as:

x1 =[j for i in x1 for j in i]
x2 =[j for i in x2 for j in i]
y =[j for i in y for j in i]

There might be more ways but you got what the problem was. For more ways, you can have a look here

Output

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
Sheldore
  • 37,862
  • 7
  • 57
  • 71