I want to add legend to my plot. I have text documents, I have processed them with PCA in order to be able to plot a 2d graph but I want to have a legend explaining the label of each color for the clusters.
My data is original a list of strings(text documents), I have used TFIDFVectorizer and then PCA. The matrix I get from applying vectorizer I have added a label for each row in order to have the group that this documents belongs to.
I can get the graph with 2d data from PCA and the colors are right(the clustering is correct) but I just want to add a legend saying: - color green --> doctype1 - color red ---> doctype2 - ....
data = vectorizer.fit_transform(documents).todense()
pca = PCA(n_components=2).fit(data)
data2D = pca.transform(data)
kmeans = KMeans(n_clusters = 4).fit(data)
clusters = kmeans.labels_.tolist()
y_means = kmeans.predict(data)
plt.scatter(data2D[:,0], data2D[:,1], c=y_means, zorder=0)
# I used n_clusters = 4 cause I know this is the optimum number of clusters
# documents is the list of strings(documents)
# I know I use the same data to predict and fit, it just to have the right colors
Thank you