Scatterplot wherein each point color is a different mixture of K colors

Question

I have 2D data that I want to cluster into K clusters. Lets suppose K=4. After running the clustering algorithm, each point has a 4 length probability vector (whose entries add up to one) indicating the probability that the point belongs to each one of the clusters.

My idea is to assign a color to each cluster and after that make a scatter plot wherein each point is colored as a mixture of all the colors according to its probability vector. If K=3 it the colors could be RGB and therefore I could make use of something like ax.scatter(x1, x2, facecolors=probability_vectors) like its said in this question . I used that solution to make the plot in the image which has K=2 (fixing the Blue column to 0 in all vectors). I could still use it for K=3 but for K=4 I need something different. Any suggestion?

EDIT:

Using Tomáš Šíma answer, if I use this code (for 5 clusters):

import colorsys
from matplotlib import pyplot as plt
import numpy as np

N = 5
HSV = [(float(x)/N, 1, 1) for x in range(1,N+1)]
RGB = map(lambda x: colorsys.hsv_to_rgb(*x), HSV)

print HSV
plt.scatter(range(N),np.repeat(0.5,N),c=RGB,s=200)

I get this output:

[(0.2, 1, 1), (0.4, 1, 1), (0.6, 1, 1), (0.8, 1, 1), (1.0, 1, 1)]

My problem now is that if I had one point whose probability vector is [0.5, 0.0, 0.0, 0.0, 0.5], i.e. one half for the yellow cluster and one half for the red one, its color should be orange. However, if I do 0.5*0.2 + 0.5*1.0 I get 0.6 which is blue. How should I compute the average in order to get orange instead of blue?

EDIT 2:

Got it, I just have to average the RGB version of each cluster centroid (instead of the HUE)

:D

You were right about mixing the colors, so I updated the answer. — Tomáš Šíma, Dec 24 '16 at 20:42

Tomáš Šíma · Accepted Answer · 2016-12-24T20:41:28.380

2

You are looking for HSB colorspace.

Color in HSB is made of 3 values:

H = HUE - actual color
S = Saturation - how much color is there(the less, the more the color look like grayscale)
B = Brightness

You can generate N most distinct colors from this space easily and then convert to RGB

import colorsys
N = 5
HSV = [(x*1.0/N, 0.5, 0.5) for x in range(N)]
RGB = map(lambda x: colorsys.hsv_to_rgb(*x), HSV)

For mixing the HSB color for painting the points, you can do weight average for the corresponding RGB colors with weight equal to the probability vector.

edited Dec 24 '16 at 20:41

answered Dec 24 '16 at 16:20

Tomáš Šíma

834
7
26

Ok so with that I have the color in RGB format for each point (great), which I can to pass as the `color` paramater of `plt.plot()` if were to plot the scatter manually point by point (iterating over the dataset and the computed color matrix). But how can I use your solution with the `plt.scatter()` method? – hipoglucido Dec 24 '16 at 19:36
Ok I just have to pass the RGB matrix as the `c` parameter of `plt.scatter()` – hipoglucido Dec 24 '16 at 19:46

score 1 · Answer 2 · answered Dec 24 '16 at 17:12

1

You can use the RGBA scheme as mentioned in the Colormap section of matplotlib colors documentation. The A stands for alpha. Also read the ScalarMappable section.

Modifying from the answer quoted in the question:

import matplotlib.pyplot as plt
import numpy as np

x, y = np.random.random((2, 10))
rgba = np.random.random((10, 4))

fig, ax = plt.subplots()
ax.scatter(x, y, s=200, facecolors=rgba)
plt.show()

answered Dec 24 '16 at 17:12

Sahil M

1,790
1
16
31

Nice. However, your solution does not scale for `K>4`, right? – hipoglucido Dec 24 '16 at 19:41

Scatterplot wherein each point color is a different mixture of K colors

EDIT:

EDIT 2:

2 Answers2