Why does matplotlib extrapolate/plot missing values?

Question

I have a situation where sometimes, a whole series of data is not available. I'm real-time plotting values from sensors, and these can be turned on and off via user interaction, and thus I cannot be sure the values are always in a series. A user can start a sensor and later turn it off and on again, but In this case, matplotlib draws a line from the last end point and the new start point.

The data I plotted was as follows:

[[  5.          22.57011604]
 [  6.          22.57408142]
 [  7.          22.56350136]
 [  8.          22.56394005]
 [  9.          22.56790352]
 [ 10.          22.56451225]
 [ 11.          22.56481743]
 [ 12.          22.55789757]
  #Missing x vals. Still plots straight line..
 [ 29.          22.55654716]
 [ 29.          22.56066513]
 [ 30.          22.56110382]
 [ 31.          22.55050468]
 [ 32.          22.56550789]
 [ 33.          22.56213379]
 [ 34.          22.5588932 ]
 [ 35.          22.54829407]
 [ 35.          22.56697655]
 [ 36.          22.56005478]
 [ 37.          22.5568161 ]
 [ 38.          22.54621696]
 [ 39.          22.55033493]
 [ 40.          22.55079269]
 [ 41.          22.55475616]
 [ 41.          22.54783821]
 [ 42.          22.55195618]]

my plot function looks a lot simplified like this:

def plot(self, data)
    for name, xy_dict in data.iteritems():
        x_vals = xy_dict['x_values']
        y_vals = xy_dict['y_values']
        line_to_plot = xy_dict['line_number']
        self.lines[line_to_plot].set_xdata(x_vals)
        self.lines[line_to_plot].set_ydata(y_vals)

Does anyone know why it does like that? And do I have to take care of non-serial x and y values when plotting? It seems matplotlib should take care of this on its own.. Otherwise i have to split lists into smaller lists and plot these?

Bart · Accepted Answer · 2016-07-04T08:43:15.073

3

One option would be to add dummy items wherever data is missing (in your case apparently when x changes by more than 1), and set them as masked elements. That way matplotlib skips the line segments. For example:

import numpy as np
import matplotlib.pylab as pl

# Your data, with some additional elements deleted...
data = np.array(
[[  5., 22.57011604],
 [  6., 22.57408142],
 [  9., 22.56790352],
 [ 10., 22.56451225],
 [ 11., 22.56481743],
 [ 12., 22.55789757],
 [ 29., 22.55654716],
 [ 33., 22.56213379],
 [ 34., 22.5588932 ],
 [ 35., 22.54829407],
 [ 40., 22.55079269],
 [ 41., 22.55475616],
 [ 41., 22.54783821],
 [ 42., 22.55195618]])

x = data[:,0]
y = data[:,1]

# Difference from element to element in x
dx = x[1:]-x[:-1]

# Wherever dx > 1, insert a dummy item equal to -1
x2 = np.insert(x, np.where(dx>1)[0]+1, -1)
y2 = np.insert(y, np.where(dx>1)[0]+1, -1)

# As discussed in the comments, another option is to use e.g.:
#x2 = np.insert(x, np.where(dx>1)[0]+1, np.nan)
#y2 = np.insert(y, np.where(dx>1)[0]+1, np.nan)
# and skip the masking step below.

# Mask elements which are -1
x2 = np.ma.masked_where(x2 == -1, x2)
y2 = np.ma.masked_where(y2 == -1, y2)

pl.figure()
pl.subplot(121)
pl.plot(x,y)
pl.subplot(122)
pl.plot(x2,y2)

edited Jul 04 '16 at 08:43

answered Jul 04 '16 at 08:09

Bart

9,825
5
47
73

1

Real nice! Thanks for the tip. I'm currently looking into using np.nan to also discontinue a line, which apparently works. I will try that first and yours later. – enrm Jul 04 '16 at 08:17
The masking option of @Bart is great. I usually use NaN to achieve the same effect `x2 = np.insert(x, np.where(dx>1)[0]+1, NaN)` and no need for masking. – Aguy Jul 04 '16 at 08:38
Yes, I agree... I tend to prefer masked arrays (which preserves the invalid values) but in this case there is no need for that since we are explicitly adding the invalid values. – Bart Jul 04 '16 at 08:41
Does anyone have any idea on how efficient it is to mask? I.e which is the most expensive in terms of cpu vs adding nans ? – enrm Jul 04 '16 at 08:44
1

Masking seems to be relatively slow, for example on an array with random numbers, `np.ma.masked_where(x<0, x)` is about 5x slower as `x[x<0]=np.nan` for a large size of `x`. – Bart Jul 04 '16 at 08:49
Im getting an error wen trying to insert np.nan instead of -1: x2 = np.insert(x_vals, np.where(dx>1)[0]+1, np.nan) File "c:\Python27\lib\site-packages\numpy\lib\function_base.py", line 3824, in insert new[slobj] = values ValueError: cannot convert float NaN to integer – enrm Jul 04 '16 at 08:59
1

I can't reproduce that, but given this: http://stackoverflow.com/questions/12708807/numpy-integer-nan answer, it seems that your `x2` array is interpreted as an array with integers. You could try something like `x2.astype(np.float64)` – Bart Jul 04 '16 at 09:02
Ah, i Did x = np.array(..) first because the dx = x[1:]-x[:-1] seems to be `numpy` only? – enrm Jul 04 '16 at 09:06
1

Yes, you need `numpy` arrays for my solution, so if you start of from Python lists, you need to do a cast first (where you can pass the `dtype` keyword, e.g. `x=np.array(x, dtype=np.float64)`) – Bart Jul 04 '16 at 09:08
1

Ah, real nice. Tried it and it works like it should. Lots of thanks for your answers! <3 – enrm Jul 04 '16 at 09:11

score 3 · Answer 2 · answered Jul 04 '16 at 08:39

3

Another option is to include None or numpy.nan as values for y.

This, for example, shows a disconnected line:

import matplotlib.pyplot as plt
plt.plot([1,2,3,4,5],[5,6,None,7,8])

answered Jul 04 '16 at 08:39

honza_p

2,073
1
23
37

Found this out as well. Am currently trying to incorporate @Bart s answer to use np.nan, but getting some errors. Regards! – enrm Jul 04 '16 at 09:01

score 1 · Answer 3 · answered Jul 04 '16 at 07:10

1

Matplotlib will connect all your consequetive datapoints with lines.

If you want to avoid this you could split your data at the missing x-values, and plot the two splitted lists separately.

answered Jul 04 '16 at 07:10

Alex Kamphuis

68
4

Can I still set the x and y-data as before or do I need to plot them on new lines? (That would not be very good, since they would change colors etc). I would prefer if the data points were on the same matplotlib line – enrm Jul 04 '16 at 07:43

Why does matplotlib extrapolate/plot missing values?

3 Answers3

Linked