Optimization of variable step decimation function

Question

I am working with simulated waveforms and need to greatly reduce their size before saving them on disk.

Simulation data is a bit different from traditional sampled data in that there is a dynamic time step. This allow to only increase time resolution when sometime is happening while reducing data when there is nothing, but it is far from perfect. Because the simulator writes a time snapshot if something happens somewhere, you get many curves that have regularly small time steps while not needing it. That represent a lot of extra data (up to 1000% overhead, translating to 200Gb of junk data per day).

I have already an implementation that and it kind of does the trick, but this function is really slow. So slow that it takes about 20% of my processor time (which should have been reserved for simulations).

The function I have does the following:

Take 3 contiguous points.
Linearly interpolate the y value of the center point.
Compare the interpolation to the real value (absolute & relative error) .
If error is small mark the said point for deletion.
Loop back to 1.

Once all points have been tested, delete all points that are not next to a point deleted in this step (simply deleting all marked points may delete everything, for the linear interpolation is loosing it's reference data). Restart until no (or little) points are deleted.

Code as follows:

#return true if
#    the differnce between the two points a&p is smaller than the absolute error ea
#    the  relative error between a&p is smaller than the proportionnal error ep
def markForRemoval(a,p,ea,ep):
    ret = (abs(a-p)<ea)
    if a != 0:
        e = abs(abs(p/a)-1)
        ret = ret or e<ep
    return ret

def inter(x0,x1,x2,y0,y2):
    d = (y0-y2)/(x0-x2)
    return d*(x1-x0)+y0

def checkBeforeDel(showMe):
    last = True #first element is alway kept
    for idx,items in enumerate(showMe):
        if showMe[idx] == False and last == False: # if previous element has been removed, do not accept second removal
            showMe[idx]=True #this element has been unmarked and won't be deleted in this step
        last=showMe[idx]#tell if this loop resulted in a removal or not
    return showMe

def decimate(t,d,ea,ep):
    #get time shifted vector so that a list comprehension can be used
    zipped = zip(t[2:],t[1:-1],t[:-2],d[2:],d[1:-1],d[:-2])
    #create a mask for np. Needs to invert mark for removal to get a show/hide mask instead
    show = np.concatenate(
        ([True],#always keep first and last element
          ~np.array([markForRemoval(inter(t0,t1,t2,d0,d2), d1, ea, ep) for [t0,t1,t2,d0,d1,d2] in zipped]),
         [True]))

    show = checkBeforeDel(show)
    tret = t[show]#decimate the time vector using constructed mask
    dret = d[show]#decimate the data vector using constructed mask
    dec = len(t)-np.sum(show)#tell how many points have been decimated
    return (tret, dret, dec)#return the whole stuff

and for the actual run:

dec = 10000
tt = time #vector with 10Millions points
dd = Vout #same length as above. plt.plot(time,Vout) plot the curve
while dec>=1000:#keep going until there is less than 1k removal (1k is peanuts)
    [tt,dd, dec] = decimate(tt,dd,ErrorAbsolute,1e-3)
#decimation is over, retrieve the data
decVoutTime = tt
decVoutDat = dd

For 10 million points, this takes between 10s and 50s seconds on my PC depending on how many points are removed. At the beginning it removed almost half the the data - but this drops with each run (which is why the "last" 1000 points are spared).

There are few caveat to consider for this case:

This is not a true decimation - I don't take one every X sample out.
No two contiguous samples can be removed in one run - even if further run might remove samples that had previously received pardon.

I see three points for optimization but I can't tell how I can check which one I should focus on (I know how to time the whole thing - not a subset).

markForRemoval function.
List comprehension. I'm a C/ASM guy & python is burning my brain - there are too many ways to do the same thing. In the end I'm not even sure this is speed friendly.
Pre-decimation check for contiguous samples. I would have love to get this one into the list comprehension above, even if just to fulfill my craving for code elegance. What is sure is that with the current construct I loop twice on the same vector; one time for nothing.

edit: added missing inter function above + profiler output code and result below

import cProfile

def decimateAll(t,d,ea,ep,stopAt):
    dec=stopAt
    while dec>=stopAt:
        [t,d, dec] = decimate(t,d,ErrorAbsolute,ErrorProportionnal)
    return (t,d)

cProfile.run('decimateAll(time,Vout,ErrorAbsolute,ErrorProportionnal,1000)')

49992551 function calls in 41.260 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  9998483 5.609    0.000    5.609    0.000 <ipython-input-24-22d73b8bfa1a>:12(inter)
   12   15.999    1.333   15.999    1.333 <ipython-input-24-22d73b8bfa1a>:17(checkBeforeDel)
  9998483 10.750    0.000   12.524    0.000 <ipython-input-24-22d73b8bfa1a>:4(markForRemoval)
   12    0.140    0.012   41.252    3.438 <ipython-input-25-3960a7141d64>:1(decimate)
    1    0.008    0.008   41.260   41.260 <ipython-input-25-3960a7141d64>:25(decimateAll)
   12    6.515    0.543   24.647    2.054 <ipython-input-25-3960a7141d64>:7(<listcomp>)
    1    0.000    0.000   41.260   41.260 <string>:1(<module>)
   12    0.000    0.000    0.010    0.001 fromnumeric.py:1821(sum)
   12    0.000    0.000    0.010    0.001 fromnumeric.py:64(_wrapreduction)
 29995449 1.774    0.000    1.774    0.000 {built-in method builtins.abs}
    1    0.000    0.000   41.260   41.260 {built-in method builtins.exec}
   12    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
   12    0.000    0.000    0.000    0.000 {built-in method builtins.len}
   12    0.452    0.038    0.452    0.038 {built-in method numpy.core.multiarray.array}
   12    0.005    0.000    0.005    0.000 {built-in method numpy.core.multiarray.concatenate}
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
   12    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
   12    0.010    0.001    0.010    0.001 {method 'reduce' of 'numpy.ufunc' objects}

if I've understood the profile output properly, the function that care that no two element are deleted is creating a huge overhead. That is completely out of my expectations

Even though itt sounds like you may have already done it, if not then see [How can you profile a Python script?](https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script). Beyond improving the current implementat, the only way to do more would be to find and implement a better algorithm. — martineau, Sep 24 '20 at 23:26
finding a better algorithm is my question; or at least recommendation on how to improve it - I wouldn't dare to say I got the most optimized version on first try. Profiling (thanks for the link btw) is the icing on cake but even so; without an idea how to speed up any part of the code it is not really solving my issue. — Mr Buisson, Sep 25 '20 at 06:21

Optimization of variable step decimation function

0 Answers0