8

I am carrying out clustering and try to plot the result. A dummy data set is :

data

import numpy as np

X = np.random.randn(10)
Y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2])    # Labels of cluster 0 to 3

cluster center

 centers = np.random.randn(4, 2)    # 4 centers, each center is a 2D point

Question

I want to make a scatter plot to show the points in data and color the points based on the cluster labels.

Then I want to superimpose the center points on the same scatter plot, in another shape (e.g. 'X') and a fifth color (as there are 4 clusters).


Comment

  • I turned to seaborn 0.6.0 but found no API to accomplish the task.
  • ggplot by yhat could made the scatter plot nice but the second plot would replace the first one.
  • I got confused by the color and cmap in matplotlib so I wonder if I could use seaborn or ggplot to do it.
Zelong
  • 2,476
  • 7
  • 31
  • 51
  • Could be more specific on `Then I want to superimpose the center points on the same scatter plot, in another shape (e.g. 'X') and a fifth color (as there are 4 clusters).` – Srivatsan Jun 30 '15 at 11:39

2 Answers2

13

The first part of your question can be done using colorbar and specifying the colours to be the Cluster array. I have vaguely understood the second part of your question, but I believe this is what you are looking for.

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(10)
y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2])    # Labels of cluster 0 to 3
centers = np.random.randn(4, 2) 

fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(x,y,c=Cluster,s=50)
for i,j in centers:
    ax.scatter(i,j,s=50,c='red',marker='+')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(scatter)

fig.show()

which results in:

enter image description here

wherein your "centres" have been shown using + marker. You can specify any colours you want to them in the same way have done for x and y

Srivatsan
  • 9,225
  • 13
  • 58
  • 83
2

Part of this has been answered here. The outline is

plt.scatter(x, y, c=color)

Quoting the documentation of matplotlib:

c : color or sequence of color, optional, default [...] Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however.

So in your case, you need a color for each cluster and than fill the color array according to the cluster assignment of each point.

red = [1, 0, 0]
green = [0, 1, 0]
blue = [0, 0, 1]
colors = [red, red, green, blue, green]
Community
  • 1
  • 1
jotrocken
  • 2,263
  • 3
  • 27
  • 38