Improve min/max downsampling

Question

I have some large arrays (~100 million points) that I need to interactively plot. I am currenlty using Matplotlib. Plotting the arrays as-is gets very slow and is a waste since you can't visualize that many points anyway.

So I made a min/max decimation function that I tied to the 'xlim_changed' callback of the axis. I went with a min/max approach because the data contains fast spikes that I do not want to miss by just stepping through the data. There are more wrappers that crop to the x-limits, and skip processing under certain conditions but the relevant part is below:

def min_max_downsample(x,y,num_bins):
    """ Break the data into num_bins and returns min/max for each bin"""
    pts_per_bin = x.size // num_bins    

    #Create temp to hold the reshaped & slightly cropped y
    y_temp = y[:num_bins*pts_per_bin].reshape((num_bins, pts_per_bin))
    y_out      = np.empty((num_bins,2))
    #Take the min/max by rows.
    y_out[:,0] = y_temp.max(axis=1)
    y_out[:,1] = y_temp.min(axis=1)
    y_out = y_out.ravel()

    #This duplicates the x-value for each min/max y-pair
    x_out = np.empty((num_bins,2))
    x_out[:] = x[:num_bins*pts_per_bin:pts_per_bin,np.newaxis]
    x_out = x_out.ravel()
    return x_out, y_out

This works pretty well and is sufficiently fast (~80ms on 1e8 points & 2k bins). There is very little lag as it periodically recalculates & updates the line's x & y-data.

However, my only complaint is in the x-data. This code duplicates the x-value of each bin's left edge and doesn't return the true x-location of the y min/max pairs. I typically set the number of bins to double the axis pixel width. So you can't really see the difference because the bins are so small...but I know its there... and it bugs me.

So attempt number 2 which does return the actual x-values for every min/max pair. However it is about 5x slower.

def min_max_downsample_v2(x,y,num_bins):
    pts_per_bin = x.size // num_bins
    #Create temp to hold the reshaped & slightly cropped y
    y_temp = y[:num_bins*pts_per_bin].reshape((num_bins, pts_per_bin))
    #use argmax/min to get column locations
    cc_max = y_temp.argmax(axis=1)
    cc_min = y_temp.argmin(axis=1)    
    rr = np.arange(0,num_bins)
    #compute the flat index to where these are
    flat_max = cc_max + rr*pts_per_bin
    flat_min = cc_min + rr*pts_per_bin
    #Create a boolean mask of these locations
    mm_mask  = np.full((x.size,), False)
    mm_mask[flat_max] = True
    mm_mask[flat_min] = True  
    x_out = x[mm_mask]    
    y_out = y[mm_mask]  
    return x_out, y_out

This takes roughly 400+ ms on my machine which becomes pretty noticeable. So my question is basically is there a way to go faster and provide the same results? The bottleneck is mostly in the numpy.argmin and numpy.argmax functions which are a good bit slower than numpy.min and numpy.max.

The answer might be to just live with version #1 since it visually doesn't really matter. Or maybe try to speed it up something like cython (which I have never used).

FYI using Python 3.6.4 on Windows ... example usage would be something like this:

x_big = np.linspace(0,10,100000000)
y_big = np.cos(x_big )
x_small, y_small = min_max_downsample(x_big ,y_big ,2000) #Fast but not exactly correct.
x_small, y_small = min_max_downsample_v2(x_big ,y_big ,2000) #correct but not exactly fast.

From your question, I take it this is being called multiple times? If that's the case you can speed things up immensely by using the `set_data` method of an existing line in matplotlib. — user2699, Jan 31 '19 at 20:03
@user2699 Yes, that is exactly what I do with the output of this function. What I posted is the main bottleneck because it gets called frequently as I pan around. However, I did change the outer part of this code (not shown) so that it gets called less frequently by buffering more date beyond the x-limits of the plot. So it only has to re-calc when you pan beyond a certain limit. That has helped the user experience a good bit. — Aero Engy, Jan 31 '19 at 21:42
I've faced similar problems (although on less than 100 million points), and can include part of the code I've written that's relevant. What I found most effective is caching the downsampled signal and only downsampling when it makes a substantial difference in the view. As long as the data is (somewhat) uniformly sampled having exact x points is irrelevant. — user2699, Jan 31 '19 at 22:31
@user2699 I would like to see how you do the caching . I am doing something like that now. Basically loading data beyond the x limits by some amount and making the point density a bit more than needed. Then it triggers a new recalc only if you zoom in far enough that the point density gets low or if you pan beyond the buffers edge. I would like to see how someone else does it since I made most of it up and I know there are some logic holes. Mine will needlessly recalc sometimes. — Aero Engy, Jan 31 '19 at 23:03
I've included it in an answer. Let me know how it works for you, it's been pretty robust in my use but there may be issues I've missed. — user2699, Jan 31 '19 at 23:49

a_guest · Accepted Answer · 2019-01-31T17:59:45.527

3

I managed to get an improved performance by using the output of arg(min|max) directly to index the data arrays. This comes at the cost of an extra call to np.sort but the axis to be sorted has only two elements (the min. / max. indices) and the overall array is rather small (number of bins):

def min_max_downsample_v3(x, y, num_bins):
    pts_per_bin = x.size // num_bins

    x_view = x[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)
    y_view = y[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)
    i_min = np.argmin(y_view, axis=1)
    i_max = np.argmax(y_view, axis=1)

    r_index = np.repeat(np.arange(num_bins), 2)
    c_index = np.sort(np.stack((i_min, i_max), axis=1)).ravel()

    return x_view[r_index, c_index], y_view[r_index, c_index]

I checked the timings for your example and I obtained:

min_max_downsample_v1: 110 ms ± 5 ms
min_max_downsample_v2: 240 ms ± 8.01 ms
min_max_downsample_v3: 164 ms ± 1.23 ms

I also checked returning directly after the calls to arg(min|max) and the result was equally 164 ms, i.e. there's no real overhead after that anymore.

edited Jan 31 '19 at 17:59

answered Jan 31 '19 at 14:28

a_guest

34,165
12
64
118

Nice indexing trick and avoiding the mask. I wouldn't have thought that `sort` would be faster. I am getting 87, 421, and 338ms for v1, v2, and v3. I'd like to get it back to that sub 100ms range but I'll take every improvement I can get. However, I did notice the code above will error if x is not exactly num_bin*pts_per_bin elements long ... which is often. That is why I was doing `y_temp = y[:num_bins*pts_per_bin].reshape` – Aero Engy Jan 31 '19 at 16:28
@AeroEngy You're right, included that. Could you check the timing result when you return right after the `arg(min|max)` calls in order to see the contribution from the following call to `sort`? There are different ways to merge the indices but on my machine it really didn't matter. – a_guest Jan 31 '19 at 18:01
I ran it through a profile and got 339ms total (142ms for `argmax` & 196ms for `argmin`). The `sort` was trivial at 45us. So I think you have removed all other overhead. To do it any faster would require some unknown new approach. – Aero Engy Jan 31 '19 at 18:12
1

@AeroEngy Maybe you could follow [this approach](https://stackoverflow.com/a/12200671/3767239) in order to compile your own `argmin_max` function that computes the argmin and argmax in one go instead of iterating over the array twice. For an array of that size the difference should be measurable. – a_guest Jan 31 '19 at 20:35
Thanks for the link! I'm going to first try to make a numba version of a single pass argminmax to see how it compares. I haven't used numba or cython before but it looks interesting. I'm a recent convert to Python from Matlab so I am used to speeding up code with MEX which seem like a similar idea. – Aero Engy Jan 31 '19 at 22:07
Using your answer + that link helped me make a nice & fast version with numba. I posted it as a separate answer for posterity but will except your answer since you lead me down the correct path. Thanks again! – Aero Engy Feb 01 '19 at 03:05

score 2 · Answer 2 · answered Jan 31 '19 at 23:48

So this doesn't address speeding up the specific function in question, but it does show a few ways of plotting a line with a large number of points somewhat effectively. This assumes that the x points are ordered and uniformly (or close to uniformly) sampled.

Setup

from pylab import *

Here's a function I like that reduces the number of points by randomly choosing one in each interval. It isn't guaranteed to show every peak in the data, but it doesn't have as many problems as directly decimating the data, and is fast.

def calc_rand(y, factor):
    split = y[:len(y)//factor*factor].reshape(-1, factor)
    idx = randint(0, split.shape[-1], split.shape[0])
    return split[arange(split.shape[0]), idx]

And here's the min and max to see the signal envelope

def calc_env(y, factor):
    """
    y : 1D signal
    factor : amount to reduce y by (actually returns twice this for min and max)
    Calculate envelope (interleaved min and max points) for y
    """
    split = y[:len(y)//factor*factor].reshape(-1, factor)
    upper = split.max(axis=-1)
    lower = split.min(axis=-1)
    return c_[upper, lower].flatten()

The following function can take either of these, and uses them to reduce the data being drawn. The number of points actually taken is 5000 by default, which should far exceed a monitor's resolution. Data is cached after it's reduced. Memory may be an issue, especially with large amounts of data, but it shouldn't exceed the amount required by the original signal.

def plot_bigly(x, y, *, ax=None, M=5000, red=calc_env, **kwargs):
    """
    x : the x data
    y : the y data
    ax : axis to plot on
    M : The maximum number of line points to display at any given time
    kwargs : passed to line
    """
    assert x.shape == y.shape, "x and y data must have same shape!"
    if ax is None:
        ax = gca()

    cached = {}

    # Setup line to be drawn beforehand, note this doesn't increment line properties so
    #  style needs to be passed in explicitly
    line = plt.Line2D([],[], **kwargs)
    def update(xmin, xmax):
        """
        Update line data

        precomputes and caches entire line at each level, so initial
        display may be slow but panning and zooming should speed up after that
        """
        # Find nearest power of two as a factor to downsample by
        imin = max(np.searchsorted(x, xmin)-1, 0)
        imax = min(np.searchsorted(x, xmax) + 1, y.shape[0])
        L = imax - imin + 1
        factor = max(2**int(round(np.log(L/M) / np.log(2))), 1)

        # only calculate reduction if it hasn't been cached, do reduction using nearest cached version if possible
        if factor not in cached:
            cached[factor] = red(y, factor=factor)

        ## Make sure lengths match correctly here, by ensuring at least
        #   "factor" points for each x point, then matching y length
        #  this assumes x has uniform sample spacing - but could be modified
        newx = x[imin:imin + ((imax-imin)//factor)* factor:factor]
        start = imin//factor
        newy = cached[factor][start:start + newx.shape[-1]]
        assert newx.shape == newy.shape, "decimation error {}/{}!".format(newx.shape, newy.shape)

        ## Update line data
        line.set_xdata(newx)
        line.set_ydata(newy)

    update(x[0], x[-1])
    ax.add_line(line)
    ## Manually update limits of axis, as adding line doesn't do this
    #   if drawing multiple lines this can quickly slow things down, and some
    #   sort of check should be included to prevent unnecessary changes in limits
    #   when a line is first drawn.
    ax.set_xlim(min(ax.get_xlim()[0], x[0]), max(ax.get_xlim()[1], x[1]))
    ax.set_ylim(min(ax.get_ylim()[0], np.min(y)), max(ax.get_ylim()[1], np.max(y)))

    def callback(*ignore):
        lims = ax.get_xlim()
        update(*lims)

    ax.callbacks.connect('xlim_changed', callback)

    return [line]

Here's some test code

L=int(100e6)
x=linspace(0,1,L)
y=0.1*randn(L)+sin(2*pi*18*x)
plot_bigly(x,y, red=calc_env)

On my machine this displays very quickly. Zooming has a bit of lag, especially when it's by a large amount. Panning has no issues. Using random selection instead of the min and max is quite a bit faster, and only has issues on very high levels of zoom.

Thanks, I didn't consider storing down-sampled version at different factors like that for later use. I'll probably borrow that and incorporate it into my routine. — Aero Engy, Feb 01 '19 at 02:29

Aero Engy · Answer 3 · 2019-02-01T03:30:46.817

EDIT: Added parallel=True to numba ... even faster

I ended up making a hybrid of a single pass argmin+max routine and the improved indexing from @a_guest's answer and link to this related simultaneous min max question.

This version returns the correct x-values for each min/max y pair and thanks to numba is a actually a little faster than the "fast but not quite correct" version.

from numba import jit, prange
@jit(parallel=True)
def min_max_downsample_v4(x, y, num_bins):
    pts_per_bin = x.size // num_bins
    x_view = x[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)
    y_view = y[:pts_per_bin*num_bins].reshape(num_bins, pts_per_bin)    
    i_min = np.zeros(num_bins,dtype='int64')
    i_max = np.zeros(num_bins,dtype='int64')

    for r in prange(num_bins):
        min_val = y_view[r,0]
        max_val = y_view[r,0]
        for c in range(pts_per_bin):
            if y_view[r,c] < min_val:
                min_val = y_view[r,c]
                i_min[r] = c
            elif y_view[r,c] > max_val:
                max_val = y_view[r,c]
                i_max[r] = c                
    r_index = np.repeat(np.arange(num_bins), 2)
    c_index = np.sort(np.stack((i_min, i_max), axis=1)).ravel()        
    return x_view[r_index, c_index], y_view[r_index, c_index]

Comparing the speeds using timeit shows the numba code is roughly 2.6x faster and providing better results that v1. It is a little over 10x faster than doing numpy's argmin & argmax in series.

%timeit min_max_downsample_v1(x_big ,y_big ,2000)
96 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit min_max_downsample_v2(x_big ,y_big ,2000)
507 ms ± 4.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit min_max_downsample_v3(x_big ,y_big ,2000)
365 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit min_max_downsample_v4(x_big ,y_big ,2000)
36.2 ms ± 487 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

That's awesome! I wasn't aware that with Numba you can do it all in Python. The resulting code is super easy to read and integrating it is just a decorator away. Thanks for sharing the code :-) — a_guest, Feb 01 '19 at 16:59
@a_guest I was very surprised with how easy it was to use. Before I added the `prange` and `parallel=True` the time was ~86ms which made me happy. After adding those it is amazing how much faster it is than Numpy alone. It does take about 1/2 second or so to compile on the first use but this gets called a lot so that doesn't matter at all. — Aero Engy, Feb 01 '19 at 18:31

score 0 · Answer 4 · answered Feb 03 '19 at 17:51

0

Did you try pyqtgraph for interactive plotting? It's more responsive than matplotlib.

One trick I use for downsampling is to use array_split and compute min and max for the splits. The split is done according to the number of samples per pixel of the plot area.

answered Feb 03 '19 at 17:51

danielhrisca

665
1
5
11

I did try pyqtgraph and it did seem faster. It also has built in min/max downsampling routine which is nice. However, I didn't really like it's plot mechanics. If you read the question/answers you'll see that I am doing min/max with a reshape instead of array_split. `reshape` has much less overhead since it returns a single view of the array. `array_split` seems to return a `list` of views. Splitting a 1e8 element array takes ~2ms while reshape is ~520ns ... so reshape is 3800x faster on my machine. – Aero Engy Feb 04 '19 at 12:30
array_split is indeed a lot slower then reshaping – danielhrisca Feb 05 '19 at 14:11

Improve min/max downsampling

4 Answers4

Linked