1

I want to add legend to my plot. I have text documents, I have processed them with PCA in order to be able to plot a 2d graph but I want to have a legend explaining the label of each color for the clusters.

My data is original a list of strings(text documents), I have used TFIDFVectorizer and then PCA. The matrix I get from applying vectorizer I have added a label for each row in order to have the group that this documents belongs to.

I can get the graph with 2d data from PCA and the colors are right(the clustering is correct) but I just want to add a legend saying: - color green --> doctype1 - color red ---> doctype2 - ....

data = vectorizer.fit_transform(documents).todense()
pca = PCA(n_components=2).fit(data)
data2D = pca.transform(data)
kmeans = KMeans(n_clusters = 4).fit(data)
clusters = kmeans.labels_.tolist()
y_means = kmeans.predict(data)


plt.scatter(data2D[:,0], data2D[:,1], c=y_means, zorder=0)

# I used n_clusters = 4 cause I know this is the optimum number of clusters
# documents is the list of strings(documents)
# I know I use the same data to predict and fit, it just to have the right colors

Thank you

Enterrador99
  • 121
  • 1
  • 13
  • Cannot quite understand where your difficulty is but how about this: https://stackoverflow.com/questions/39500265/manually-add-legend-items-python-matplotlib – Ankur Sinha Feb 19 '19 at 10:36

0 Answers0