0
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd 

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X_train,Y_train)

# Visualising the Test set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_test, Y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 
1, step = 0.01),np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, 
step = 0.01))
plt.sactter(X1, X2, knn.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),
         alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],c = ListedColormap(('red', 
    'green'))(i), label = j)

plt.title('Classifier (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

error:

File "C:\Users\shaar\.spyder-py3\MLPractice\KNN.py", line 55, in <module>
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1])

IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
hpaulj
  • 221,503
  • 14
  • 230
  • 353
Arvind SHa
  • 19
  • 1
  • 1
  • 4
  • In `X_set[y_set == j, 0]` what's the `shape` of `X_set` and `y_set`? Are those dimensions what you expect? – hpaulj Dec 11 '21 at 08:28
  • My guess is the both `X_set` and `y_set` are 2d. Thus `y_set==j` is itself 2d. That added 0/1 is one index too many. This code probably expects `y_set` to be 1d, a flat array of "label" values. – hpaulj Dec 11 '21 at 17:39
  • [X_set dimension is (200,2) and y_set dimension is (200,1)]: @hpaulj – Arvind SHa Dec 13 '21 at 08:53

3 Answers3

0

This usually occurs when you try to input or try to use the other dimension of numpy array when it's only 1D. To be more clear if you have a numpy array like

a = [1,2,3,4] And later if you try to use it's values using like (1,2) it'll take it if you were trying to find the 1st row and 2nd column of an 2D numpy array. So avoid using comma when accessing numpy arrays. Hope I'm clear, if not consider checking https://www.w3schools.com/python/numpy/numpy_creating_arrays.asp

Gourav Singh Rawat
  • 421
  • 1
  • 7
  • 17
0

It's a little confusing what numpy-arrays do if they have two dimensions but one of them have only one block. In your code-snippet we cannot see what you fills into y_set in

X_set, y_set = X_test, Y_test

but I think if you look at the dimensions of y_set with y_set.shape you will get

(150,1)

(I assume that there are 150 data-sets). Python will generate one index for each shape. To separate the wanted dimension you can set the unwanted dimension to zero:

y_set_one_dimension = y_set[:,0]
print(y_set_one_dimension.shape)

just like how it is described in How to access the ith column of a NumPy multidimensional array?

The output will be:

(150,)

Now the scatter-plot will get the wanted 2 indizies for 2 dimensions and will work.

Annotation:

If y_set is a dataframe you have to convert it first to a numpy-array with:

yArray = numpy.array(y_set)
Sylraka
  • 11
  • 2
-1
X_set, y_set = X_test, Y_test.ravel()
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
evil
  • 1
  • 3
    This answer was reviewed in the [Low Quality Queue](https://stackoverflow.com/help/review-low-quality). Here are some guidelines for [How do I write a good answer?](https://stackoverflow.com/help/how-to-answer). Code only answers are **not considered good answers**, and are likely to be downvoted and/or deleted because they are **less useful** to a community of learners. It's only obvious to you. Explain what it does, and how it's different / **better** than existing answers. [From Review](https://stackoverflow.com/review/low-quality-posts/32314656) – Trenton McKinney Jul 23 '22 at 17:23