Draw / Create Scatterplots of datasets with NaN

Question

I want to draw a scatter plot using pylab, however, some of my data are NaN, like this:

a = [1, 2, 3]
b = [1, 2, None]

pylab.scatter(a,b) doesn't work.

Is there some way that I could draw the points of real value while not displaying these NaN value?

Would it suffice to remove the NaN values as described in http://stackoverflow.com/questions/11620914/removing-nan-values-from-an-array ? — Giant Molecular Klaus, Apr 02 '13 at 00:55

Joe Kington · Accepted Answer · 2013-04-02T01:16:22.313

Things will work perfectly if you use NaNs. None is not the same thing. A NaN is a float.

As an example:

import numpy as np
import matplotlib.pyplot as plt

plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()

enter image description here

Have a look at pandas or numpy masked arrays (and numpy.genfromtxt to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas is an extremely useful library, and has very nice missing value functionality.

As an example:

import matplotlib.pyplot as plt
import pandas

x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()

pandas uses NaNs to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaNs to represent missing data.

As another example, using both masked arrays and NaNs, this time with a line plot:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)

y1 = np.ma.masked_where(y > 0.7, y)

y2 = y.copy()
y2[y > 0.7] = np.nan

fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
    ax.plot(x, ydata)
    ax.axhline(0.7, color='red')

axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")

fig.tight_layout()

plt.show()

enter image description here

Things will not work perfectly if you use NaNs and semilogy... the plot will look fine, but it throws up this warning: RuntimeWarning: invalid value encountered in less_equal mask = a <= 0.0 — poleguy, Jul 17 '15 at 16:10

Ionut Hulub · Answer 2 · 2013-04-02T01:21:04.570

1

Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.

There are many ways to accomplish this. Here is one:

a = [1, 2, 3]
b = [1, None, 2]

i = 0
while i < len(a):
    if a[i] == None or b[i] == None:
        a = a[:i] + a[i+1:]
        b = b[:i] + b[i+1:]
    else:
        i += 1

"""Now a = [1, 3] and b = [1, 2]"""

pylab.scatter(a,b)

edited Apr 02 '13 at 01:21

answered Apr 02 '13 at 01:05

Ionut Hulub

6,180
5
26
55

2

Be careful with `if not a[i]...`. If either array has zeros, you'll remove them. Zero is a perfectly valid value! – Joe Kington Apr 02 '13 at 01:19

Draw / Create Scatterplots of datasets with NaN

2 Answers2