Error in Python script "Expected 2D array, got 1D array instead:"?

Question

I'm following this tutorial to make this ML prediction:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style

style.use("ggplot")
from sklearn import svm

x = [1, 5, 1.5, 8, 1, 9]
y = [2, 8, 1.8, 8, 0.6, 11]

plt.scatter(x,y)
plt.show()

X = np.array([[1,2],
             [5,8],
             [1.5,1.8],
             [8,8],
             [1,0.6],
             [9,11]])

y = [0,1,0,1,0,1]
X.reshape(1, -1)

clf = svm.SVC(kernel='linear', C = 1.0)
clf.fit(X,y)

print(clf.predict([0.58,0.76]))

I'm using Python 3.6 and I get error "Expected 2D array, got 1D array instead:" I think the script is for older versions, but I don't know how to convert it to the 3.6 version.

Already try with the:

X.reshape(1, -1)

@stackoverflowuser2010: I'd guess the last line `clf.predict()`, since `X` is already two-dimensional (useless `reshape` notwithstanding). — Mark Dickinson, Aug 07 '17 at 19:08
@JonTargaryen: What version of scikit-learn are you using? This isn't supposed to become an error until version 0.19, which isn't released yet. — Mark Dickinson, Aug 07 '17 at 19:16
@JonTargaryen the reshape is in the right place, but you are discarding the result. Assign the result back to `X`. — Mad Physicist, Aug 07 '17 at 19:32
making long answer short: ```regr.fit(np.array(x_train).reshape(-1,1), np.array(y_train).reshape(-1,1))``` where x_train = pd.read_sql_query('select * from YourTable', cnx).NeededColumn — Alexey Nikonov, Dec 03 '19 at 21:22

Ofer Sadan · Accepted Answer · 2019-10-16T20:54:50.343

213

You are just supposed to provide the predict method with the same 2D array, but with one value that you want to process (or more). In short, you can just replace

[0.58,0.76]

With

[[0.58,0.76]]

And it should work.

EDIT: This answer became popular so I thought I'd add a little more explanation about ML. The short version: we can only use predict on data that is of the same dimensionality as the training data (X) was.

In the example in question, we give the computer a bunch of rows in X (with 2 values each) and we show it the correct responses in y. When we want to predict using new values, our program expects the same - a bunch of rows. Even if we want to do it to just one row (with two values), that row has to be part of another array.

edited Oct 16 '19 at 20:54

answered Aug 07 '17 at 19:12

Ofer Sadan

11,391
5
38
62

36

but why does that work? I don't understand what the issue is. – Charlie Parker Sep 11 '17 at 18:49
2

how do you achieve this for larger dataframes? (dynamically) – Sip Aug 20 '18 at 11:58
6

Why does it have to be a 2D array? What is the reasoning behind this? – problemofficer - n.f. Monica Apr 25 '19 at 21:55

stackoverflowuser2010 · Answer 2 · 2017-08-08T17:38:21.060

23

The problem is occurring when you run prediction on the array [0.58,0.76]. Fix the problem by reshaping it before you call predict():

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style

style.use("ggplot")
from sklearn import svm

x = [1, 5, 1.5, 8, 1, 9]
y = [2, 8, 1.8, 8, 0.6, 11]

plt.scatter(x,y)
plt.show()

X = np.array([[1,2],
             [5,8],
             [1.5,1.8],
             [8,8],
             [1,0.6],
             [9,11]])

y = [0,1,0,1,0,1]

clf = svm.SVC(kernel='linear', C = 1.0)
clf.fit(X,y)

test = np.array([0.58, 0.76])
print test       # Produces: [ 0.58  0.76]
print test.shape # Produces: (2,) meaning 2 rows, 1 col

test = test.reshape(1, -1)
print test       # Produces: [[ 0.58  0.76]]
print test.shape # Produces (1, 2) meaning 1 row, 2 cols

print(clf.predict(test)) # Produces [0], as expected

edited Aug 08 '17 at 17:38

answered Aug 07 '17 at 19:17

stackoverflowuser2010

38,621
48
169
217

Since pandas 0.19.0 you need to add `.values` before `.reshape(1, -1)` as in : `test = test.values.reshape(1, -1)` – yeliabsalohcin Feb 21 '23 at 23:06
my prediction error is with LightGDM model and I know how to reshape but the LightGBM creates its own lbg.Dataset and not sure how to setup reshape or get the prediction working for LightGDM, my posted question is at: https://stackoverflow.com/questions/75894373/lightgbm-prediction-issue-valueerror-input-numpy-ndarray-or-list-must-be-2-dime – manager_matt Mar 31 '23 at 04:48

score 12 · Answer 3 · edited Apr 28 '19 at 10:32

12

I use the below approach.

reg = linear_model.LinearRegression()
reg.fit(df[['year']],df.income)

reg.predict([[2136]])

edited Apr 28 '19 at 10:32

Zoe

27,060
21
118
148

answered Apr 28 '19 at 10:29

Vikas Rathour

231
3
9

score 7 · Answer 4 · answered Jun 04 '18 at 17:06

I faced the same issue except that the data type of the instance I wanted to predict was a panda.Series object.

Well I just needed to predict one input instance. I took it from a slice of my data.

df = pd.DataFrame(list(BiogasPlant.objects.all()))
test = df.iloc[-1:]       # sliced it here

In this case, you'll need to convert it into a 1-D array and then reshape it.

 test2d = test.values.reshape(1,-1)

From the docs, values will convert Series into a numpy array.

score 3 · Answer 5 · edited Jan 11 '19 at 05:46

3

I faced the same problem. You just have to make it an array and moreover you have to put double squared brackets to make it a single element of the 2D array as first bracket initializes the array and the second makes it an element of that array.

So simply replace the last statement by:

print(clf.predict(np.array[[0.58,0.76]]))

edited Jan 11 '19 at 05:46

Akber Iqbal

14,487
12
48
70

answered Jan 11 '19 at 03:48

Satyam Mittal

31
2

You're creating an array here so you want np.array[[0.58,0.76]) *not square brackets – jonincanada Jul 12 '22 at 12:22

score 3 · Answer 6 · edited Feb 10 '21 at 10:34

3

Just insert the argument between a double square bracket:

regressor.predict([[values]])

that worked for me

edited Feb 10 '21 at 10:34

UseR10085

7,120
3
24
54

answered Oct 08 '19 at 08:35

Camunatas

123
1
9

score 1 · Answer 7 · edited Oct 05 '19 at 22:15

1

I was facing the same issue earlier but I have somehow found the solution, You can try reg.predict([[3300]]).

The API used to allow scalar value but now you need to give a 2D array.

edited Oct 05 '19 at 22:15

Philzen

3,945
30
46

answered Oct 05 '19 at 19:45

FASIH AHMED

11
2

score 0 · Answer 8 · answered Apr 23 '19 at 08:48

0

With one feature my Dataframe list converts to a Series. I had to convert it back to a Dataframe list and it worked.

if type(X) is Series:
    X = X.to_frame()

answered Apr 23 '19 at 08:48

samuelru

119
1
5

score 0 · Answer 9 · answered Jun 06 '21 at 16:59

0

Just enclose your numpy object with two square brackets or vice versa.

For example:

If initially your x = [8,9,12,7,5]

change it to x = [ [8,9,12,7,5] ].

That should fix the dimension issue

answered Jun 06 '21 at 16:59

Babatunde Mustapha

2,131
20
21

score 0 · Answer 10 · edited May 20 '23 at 22:13

0

You can do it like this:

np.array(x)[:, None]

edited May 20 '23 at 22:13

sideshowbarker

81,827
26
193
197

answered Dec 03 '21 at 14:26

Miguel Tomás

1,714
1
13
23

score -1 · Answer 11 · answered May 26 '18 at 15:07

The X and Y matrix of Independent Variable and Dependent Variable respectively to DataFrame from int64 Type so that it gets converted from 1D array to 2D array.. i.e X=pd.DataFrame(X) and Y=pd.dataFrame(Y) where pd is of pandas class in python. and thus feature scaling in-turn doesn't lead to any error!

Error in Python script "Expected 2D array, got 1D array instead:"?

11 Answers11

Linked

Related