how to color data points based on some rules in matplotlib

Question

I have a signal, and I would like to color in red point which are too far from the mean of the signal. For example:

k=[12,11,12,12,20,10,12,0,12,10,11]
x2=np.arange(1,12,1)
plt.scatter(x2,k, label="signal")
plt.show()

I would like to color in red the data points 20 and 0, and I give them a special label like "warning". I read matplotlib: how to change data points color based on some variable, but I am not sure how to apply it on my case

swenzel · Accepted Answer · 2015-09-23T15:14:41.923

12

If you want different labels, you need different plots.
Filter your data according to your formula.
In this case I took values which are more than 1.5 standard deviations away from the mean. In case you don't know, in numpy you can use boolean masks to index arrays and only take elemets where the mask is True. You can also easily flip the mask with the complement operator ~.

import matplotlib.pyplot as plt
import numpy as np

k=np.array([12,11,12,12,20,10,12,0,12,10,11])
x2=np.arange(1,12,1)

# find out which parameters are more than 1.5*std away from mean
warning = np.abs(k-np.mean(k)) > 1.5*np.std(k)

# enable drawing of multiple graphs on one plot
plt.hold(True)

# draw some lines behind the scatter plots (using zorder)
plt.plot(x2, k, c='black', zorder=-1)

# scatter valid (not warning) points in blue (c='b')
plt.scatter(x2[~warning], k[~warning], label='signal', c='b')

# scatter warning points in red (c='r')
plt.scatter(x2[warning], k[warning], label='warning', c='r')

# draw the legend
plt.legend()

# show the figure
plt.show()

This is what you get:

edited Sep 23 '15 at 15:14

answered Sep 18 '15 at 14:20

swenzel

6,745
3
23
37

swendel, what if i wanted to use a plot instead and not the scatter plot? but still want the warning point to be red? – user3841581 Sep 23 '15 at 14:53
Well, in that case the easiest solution would be to cheat a bit and add an extra plot that is rendered before the two scatter-plots. It *is* possible to have markers in different colors within one plot but getting the legend right might not be as easy. You probably could create a colorscheme and plot and annotate that but due to simplicity I'd prefer it this way. Once you want to have gradually color change indicating something like a severity, the colorschme thing might be better though. – swenzel Sep 23 '15 at 15:07
i have a small issue, assuming that some entries with nan; how do i compute the mean taking into account those nan? – user3841581 Oct 26 '15 at 18:54
@user3841581 use [np.nanmean](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.nanmean.html). There is also [np.nanstd](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.nanstd.html) ;) – swenzel Oct 26 '15 at 22:08

score 7 · Answer 2 · edited Sep 18 '15 at 14:38

7

If you want just the colors, then try:

import numpy as np
import matplotlib.pyplot as plt

k=[12,11,12,12,20,10,12,0,12,10,11]
x2=np.arange(1,12,1)

# Calculate an outlier limit (I chose 2 Standard deviations from the mean)
k_bar = np.mean(k)
outlier_limit = 2*np.std(k)
# Generate a colour vector
kcolors = ['red' if abs(value - k_bar) > outlier_limit else 'yellow' for value in k]

#Plot using the colour vector
plt.scatter(x2,k, label="signal", c = kcolors)
plt.show()

edited Sep 18 '15 at 14:38

Hannes Ovrén

21,229
9
65
75

answered Sep 18 '15 at 14:18

TMrtSmith

461
3
16

This only creates the color vector. It does not apply the actual color to the plot. – Hannes Ovrén Sep 18 '15 at 14:32
I removed my downvote. I don't think it was particularly harsh since the answer didn't actually answer the question, and the original code wasn't even valid Python. Now it's fine. – Hannes Ovrén Sep 18 '15 at 14:40
Fair comment. I was writing without an editor to test in! – TMrtSmith Sep 18 '15 at 14:42

how to color data points based on some rules in matplotlib

2 Answers2