KNeighborsClassifier .predict() function doesn't work

Question

i am working with KNeighborsClassifier algorithm from scikit-learn library in Python. I followed basic instructions e.g. split my data and labels into training and test data, then trained my model on a training data. Now I am trying to predict accuracy of testing data but get an error. Here is my code:

from sklearn.neighbors import KNeighborsClassifier 
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score

 data_train, data_test, label_train, label_test = train_test_split(df, labels, 
                                                              test_size=0.2,
                                                              random_state=7)
mod = KNeighborsClassifier(n_neighbors=4)
mod.fit(data_train, label_train)
predictions = mod.predict(data_test)

print accuracy_score(label_train, predictions)

The error I get:

ValueError: Found arrays with inconsistent numbers of samples: [140 558]

140 is the portion of training data and 558 is the test data based on the test_size=0.2 (my data set is 698 samples). I verified that labels and data sets are of the same size 698. However, I get this error which is basically trying to compare test data and training data sets.

Does anyone knows what is wrong here? What should I use to train my model against to and what should I use to predict the score?

Thanks!

score 2 · Accepted Answer · answered Oct 02 '16 at 00:44

2

You should calculate the accuracy_score with label_test, not label_train. You want to compare the actual labels of the test set, label_test, to the predictions from your model, predictions, for the test set.

answered Oct 02 '16 at 00:44

lgaud

2,430
20
30

Thank you @Igaud ! It did work! Now when I am looking at documentation it is kind of obvious. I happen to follow the mistake that was made in tutorial. – semenoff Oct 02 '16 at 02:06

score 1 · Answer 2 · edited May 23 '17 at 12:33

1

Did you tried to solve your issue via the following question ?

sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()

edited May 23 '17 at 12:33

Community

1
1

answered Oct 01 '16 at 21:13

toshiro92

1,287
5
28
42

Thank you for suggestion! I tried reshaping my data with a following code: `label_train = np.reshape(label_train, (len(label_train), 1))` `label_test = np.reshape(label_test, (len(label_test), 1))` `print label_train.shape` `print data_train.shape` `print data_test.shape` `print label_test.shape` here is what I get for result: `(558, 1)` `(558, 2)` `(140, 2)` `(140, 1)` And still the same error: `ValueError: Found arrays with inconsistent numbers of samples: [140 558]` – semenoff Oct 01 '16 at 23:42

KNeighborsClassifier .predict() function doesn't work

2 Answers2