0

I am trying to train a simple model with sklearn kneighborsclassifier on wine quality data. This is my code:

from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
import numpy as np

dataframe = pd.read_csv("winequality-white.csv")
dataframe = dataframe.drop(["fixed acidity", "pH", "sulphates"], axis=1)

test = dataframe[110:128]
train = dataframe[15:40]

Y = train["quality"]
X = train.drop(["quality"], axis=1)


#print(X)
#print(Y)

knn = KNeighborsClassifier()
knn.fit(X, Y)
testvals = np.array(test.loc[110, :])
testvals = testvals.reshape(1, -1)
print(knn.predict([[testvals]]))

I get the error "ValueError: Found array with dim 4. Estimator expected <= 2."

I'm fairly certain it has something to do with the shape of my array and I have tried to reshape it, but had no luck. What should I do?

ssnk001
  • 170
  • 5
  • What line is raising the ValueError? – ltd9938 Jul 06 '18 at 18:44
  • It was the very last line, because my testvals array was a 4-D array. The problem was solved when I popped the target off test and just passed one entry of the array through predict, I didn't need to reshape it at all. As shown by @Tgsmith61591 – ssnk001 Jul 06 '18 at 19:04

1 Answers1

0

Consider the following (reproducible) example setup:

>>> import pandas as pd
>>> import numpy as np
>>> test = pd.DataFrame.from_records(data=np.random.rand(120, 4))
>>> testvals = np.array(test.loc[110, :])

The way you're reshaping your vector when you pass it to the predict function is creating an array with more than the expected 2 dims (i.e., a multidimensional array). Here's the output of your reshape that you're passing into the predict function:

>>> [[testvals.reshape((-1, 1))]]
[[array([[ 0.25174728],
       [ 0.24603664],
       [ 0.01781963],
       [ 0.49317648]])]]

We can show this produces a 4-d array:

>>> np.asarray([[testvals.reshape((-1, 1))]]).ndim
4

Sklearn expects a 2d array. Here's how you can fix it... If you want to predict the entire matrix, just run:

knn.predict(test)

If you just want to predict for one sample, you could do:

knn.predict([test.loc[110].tolist()])

By the way, it's worth mentioning you have still not popped the target off of test, so the number of features won't match until you do:

y_test = test.pop('quality')

See also this question

TayTay
  • 6,882
  • 4
  • 44
  • 65