4

If I have a scatter plot like thisenter image description here

I was wondering is there any way to change the obvious outliers, like the three on the top, to some other colors in the same plot?

xzt
  • 101
  • 3
  • 10
  • 2
    This is a hard question to answer because outlier isn't a term with a true definition. [This post](http://stackoverflow.com/questions/11882393/matplotlib-disregard-outliers-when-plotting) might help though. – Eli Sadoff Oct 30 '16 at 20:02

2 Answers2

11

First, you need to find a criterion for "outliers". Once you have that, you could mask those unwanted points in your plot. Selecting a subset of an array based on a condition can be easily done in numpy, e.g. if a is a numpy array, a[a <= 1] will return the array with all values bigger than 1 "cut out".

Plotting could then be done as follows

import numpy as np
import matplotlib.pyplot as plt

num= 1000
x= np.linspace(0,100, num=num)
y= np.random.normal(size=num)

fig=plt.figure()
ax=fig.add_subplot(111)
# plot points inside distribution's width
ax.scatter(x[np.abs(y)<1], y[np.abs(y)<1], marker="s", color="#2e91be")
# plot points outside distribution's width
ax.scatter(x[np.abs(y)>=1], y[np.abs(y)>=1], marker="d", color="#d46f9f")
plt.show()

producing

enter image description here

Here, we plot points from a normal distribution, colorizing all points outside the distribution's width differently.

Community
  • 1
  • 1
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
1

ImportanceOfBeingErnest has a great answer. Here's a one-liner I use if I have an array corresponding to enum categories for the data points (especially useful when visualizing data pre divided into classes).

import numpy as np
import matplotlib.pyplot as plt

num = 1000
x= np.random.rand(1,100)
y= np.random.rand(1,100)*2

# Creating a simple data point classification criteria, classes in this case will be 0, 1 and 2
classes = np.round(y)

# Passing in the classes for the "c" argument is super convinient
plt.scatter(x,y, c=classes,cmap=plt.cm.Set1)
plt.show()

Corresponding scatter plot that divides the graph into 3 colored regions:

double-beep
  • 5,031
  • 17
  • 33
  • 41