K-nearest Neighbours Classification Algorithm in Python

Question

I have found a code online for K-NN classification technique and I want to print all the predicted values and the values of the test dataset. But it is showing only half of the dataset. It would be very helpful if you could tell us how to see the entire dataset.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score

dataset = pd.read_csv('E:\pima-indians-diabetes.data.csv')



x = dataset.iloc[:, 0:8]
y = dataset.iloc[:, 8]
X_train, X_test, y_train, y_test = train_test_split(x , y, random_state= 0, test_size= 0.2)


sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)


classifier = KNeighborsClassifier(n_neighbors = 10, p=2, metric = 'minkowski')

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)


cm = confusion_matrix(y_test, y_pred)
print(cm)
print(f1_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred))
print(max(y_test.mean(), 1-y_test.mean()))
print (y_pred)
print (y_test)

This is the code which I am using. Below is the output it is showing.

runfile('C:/Users/Lenovo/Desktop/EE Codes/Knn with prima.py', wdir='C:/Users/Lenovo/Desktop/EE Codes')
[[91 10]
 [30 23]]
0.53488372093
0.74025974026
0.6558441558441559
[1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0
 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0
 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1
 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0
 0 0 0 0 0 0]
661    1
122    0
113    1
14     1
529    0
103    0
338    1
588    0
395    0
204    0
31     0
546    0
278    0
593    0
737    0
202    0
175    0
55     1
479    1
365    1
417    0
577    0
172    0
352    0
27     0
605    1
239    0
744    0
79     0
496    0
      ..
413    1
694    1
698    0
386    1
456    0
728    0
71     1
49     0
210    0
409    0
503    0
37     1
687    0
48     0
261    0
653    0
331    1
568    1
196    1
76     0
64     0
671    0
52     1
310    0
416    1
476    0
482    0
230    1
527    0
380    0
Name: 1, Length: 154, dtype: int64

As you can see that while printing the test dataset, it is showing values till 496 and then it shows two dots and then the dataset further. Can you please tell me a way to see the entire dataset with no missing values in between. Thank you in advance

score 0 · Answer 1 · answered Jan 31 '18 at 16:10

There are different solutions, depending on why you want a complete output

write them to file (Saving prediction results to CSV)
concatenate with original dataframe (Merging results from model.predict() with original pandas DataFrame?)
change number of rows displayed (Is there a way to (pretty) print the entire Pandas Series / DataFrame?)
slice the output and print the parts one for one a_third=int(len(y_test)/3) print(y_test[:a_third]) print(y_test[a_third:-a_third]) print(y_test[-a_third:])

In my opinion, the last option is really ugly and should be avoided. You probably want to go with the third, but this depends on your intentions

Since the user isn't using pandas, I think he just wants to print each line item on screen without having python "scrunch" the list up with the "..." in the middle. — Dylan, Jan 31 '18 at 16:17

score 0 · Answer 2 · answered Jan 31 '18 at 16:21

Well, "printing the entire dataset" is different than printing the "test" dataset, since we split the "entire" data set into train and test. And since it looks like the print('y_test') is the thing you want to not get truncated with the "..." in the middle, let's try printing that.

When you called print(y_test) python is trying to return a very long list, and it presumes you don't want to see the whole thing since it's so long.

You could try this: Pythonic way to print list items

print(*y_test, sep='\n')

where the 'sep=\n' tells python to force everything onto a new line, and the * character in front of y_test is explained over here: What does asterisk * mean in Python?

As an aside, things like a jupyter notebook make it easy to force all those "print" commands onto different lines, so all the different results get their own little window to keep them separate and easier to read.

hit the little up arrow next to my answer so it gets marked as correct and this question gets closed. Have a good day. — Dylan, Jan 31 '18 at 18:23

K-nearest Neighbours Classification Algorithm in Python

2 Answers2