8

I want to draw a scatter plot using pylab, however, some of my data are NaN, like this:

a = [1, 2, 3]
b = [1, 2, None]

pylab.scatter(a,b) doesn't work.

Is there some way that I could draw the points of real value while not displaying these NaN value?

FooBar
  • 15,724
  • 19
  • 82
  • 171
yangsuli
  • 1,252
  • 5
  • 16
  • 33

2 Answers2

17

Things will work perfectly if you use NaNs. None is not the same thing. A NaN is a float.

As an example:

import numpy as np
import matplotlib.pyplot as plt

plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()

enter image description here

Have a look at pandas or numpy masked arrays (and numpy.genfromtxt to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas is an extremely useful library, and has very nice missing value functionality.

As an example:

import matplotlib.pyplot as plt
import pandas

x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()

pandas uses NaNs to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaNs to represent missing data.

As another example, using both masked arrays and NaNs, this time with a line plot:

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)

y1 = np.ma.masked_where(y > 0.7, y)

y2 = y.copy()
y2[y > 0.7] = np.nan

fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
    ax.plot(x, ydata)
    ax.axhline(0.7, color='red')

axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")

fig.tight_layout()

plt.show()

enter image description here

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • Things will not work perfectly if you use NaNs and semilogy... the plot will look fine, but it throws up this warning: RuntimeWarning: invalid value encountered in less_equal mask = a <= 0.0 – poleguy Jul 17 '15 at 16:10
1

Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.

There are many ways to accomplish this. Here is one:

a = [1, 2, 3]
b = [1, None, 2]

i = 0
while i < len(a):
    if a[i] == None or b[i] == None:
        a = a[:i] + a[i+1:]
        b = b[:i] + b[i+1:]
    else:
        i += 1

"""Now a = [1, 3] and b = [1, 2]"""

pylab.scatter(a,b)
Ionut Hulub
  • 6,180
  • 5
  • 26
  • 55
  • 2
    Be careful with `if not a[i]...`. If either array has zeros, you'll remove them. Zero is a perfectly valid value! – Joe Kington Apr 02 '13 at 01:19