Stupid way to plot a scatter plot
Suppose I have a data with 3 classes, the following code can give me a perfect graph with a correct legend, in which I plot out data class by class.
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs()
X0 = X[y==0]
X1 = X[y==1]
X2 = X[y==2]
ax = plt.subplot(1,1,1)
ax.scatter(X0[:,0],X0[:,1], lw=0, s=40)
ax.scatter(X1[:,0],X1[:,1], lw=0, s=40)
ax.scatter(X2[:,0],X2[:,1], lw=0, s=40)
ax.legend(['0','1','2'])
Better way to plot a scatter plot
However, if I have a dataset with 3000 classes, the above method doesn't work anymore. (You won't expect me to write 3000 line corresponding to each class, right?) So I come up with the following plotting code.
num_classes = len(set(y))
palette = np.array(sns.color_palette("hls", num_classes))
ax = plt.subplot(1,1,1)
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])
ax.legend(['0','1','2'])
This code is perfect, we can plot out all the classes with only 1 line. However, the legend is not showing correctly this time.
Question
How to maintain a correct legend when we plot graphs by using the following?
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])