Multi-linear Regression with sklearn

Question

I started working on sklearn and have been trying to implement multilinear regression. I referred to an example and tried implementing the same way with my dataframe - but ended up getting

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample

Here is my code

# content - pandas object
# content - <class 'pandas.core.frame.DataFrame'>
x = content[['Feature 1', 'Feature 3']].values.reshape(-1,2)
y = content['Feature 2']
# 70 / 30 split
x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.3)model = LinearRegression()
model.fit(x_train, y_train)
y_predict = model.predict(x_test)
accuracy_score = model.score(y_test, y_predict)
return x_test, y_test, model.coef_, model.intercept_, y_predict, accuracy_score, x_train, y_train

I add the reshape(-1,1) to the y = content['Feature 2'] and I end up getting an issue stating ValueError: shapes (3,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0). I am pretty sure that I am making a trivial error - just not able to figure out where.

The data is basically a set of features and a class Feature 1, Feature 2, Feature 3, Feature 4... Class. Each of the features doesn't really mean a lot. Although it is only a dummy dataset with a set of features and a class.

And what I am trying to do is - apply multilinear regression between x = [Feature 1, Feature 2] and y = [Feature 3]

Not sure if I should be doing something like a dot product between Feature 1 and Feature 2 to obtain some [[number],[number],[number]]

And when I remove the .value.shape(-1,2) I get ValueError: shapes (1,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)

Not sure where I am going wrong.

Thank you very much for your help :)

The head of my training set is - Instance Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 ... — Bhargav Panth, Oct 23 '17 at 22:10
The error is not in `y` but X. Can you post some samples of your data? — Vivek Kumar, Oct 24 '17 at 01:45
X = [[ 66957 96087], [96030 108299]] and y = [[74432] [86875]] — Bhargav Panth, Oct 24 '17 at 12:26
What do the numbers represent? Is this two samples (rows) of data? Also, please post the complete code if possible along with more data — Vivek Kumar, Oct 24 '17 at 12:45
if `content` is a ordinary dataframe, there is no need for reshape since every row would have some parameters, and matched to a prediction in y — chrisckwong821, Oct 24 '17 at 15:02

Multi-linear Regression with sklearn

0 Answers0