Low vectorisation potential on a "forward-dependent-loop" code
majority of your "vectorisation" parallelism is out of the game, once the dependency is analysed. ( JIT-compiler cannot vectorise "against" such dependence barrier either )
you may pre-calculate some re-used values in a vectorised manner, but there is no direct python syntax manner ( without an external JIT-compiler workaround ) to arrange forward-shifting-dependence loop computation into your CPU vector-register aligned co-parallel computation:
from zmq import Stopwatch # ok to use pyzmq 2.11 for [usec] .Stopwatch()
aStopWATCH = Stopwatch() # a performance measurement .Stopwatch() instance
sig = np.abs(sig) # self-destructive calc/assign avoids memalloc-OPs
aConst = ( 1 - alpha ) # avoids many repetitive SUB(s) in the loop
for thisPtr in range( 1, len( sig ) ): # FORWARD-SHIFTING-DEPENDENCE LOOP:
prevPtr = thisPtr - 1 # prevPtr->"previous" TimeSlice in out[] ( re-used 2 x len(sig) times )
if sig[thisPtr] < out[prevPtr]: # 1st re-use
out[thisPtr] = out[prevPtr] * beta # 2nd
else:
out[thisPtr] = out[prevPtr] * alpha + ( aConst * sig[thisPtr] ) # 2nd
A good example of vectorised speed-up can be seen in cases, where calculation strategy can be parallelised/broadcast along 1D, 2D or even 3D structure of the native numpy array. For a speedup of about 100x see an RGBA-2D matrix accelerated processing in Vectorised code for a PNG picture processing ( an OpenGL shader pipeline)
Performance increased still about 3x
Even this simple python
code revision has increased the speed more than about 2.8x times ( right now, i.e. without undertaking an installation to allow using an ad-hoc JIT-optimising compiler ):
>>> def aForwardShiftingDependenceLOOP(): # proposed code-revision
... aStopWATCH.start() # ||||||||||||||||||.start
... for thisPtr in range( 1, len( sig ) ):
... # |vvvvvvv|------------# FORWARD-SHIFTING-LOOP DEPENDENCE
... prevPtr = thisPtr - 1 #|vvvvvvv|--STEP-SHIFTING avoids Numpy syntax
... if ( sig[ thisPtr] < out[prevPtr] ):
... out[ thisPtr] = out[prevPtr] * beta
... else:
... out[ thisPtr] = out[prevPtr] * alpha + ( aConst * sig[thisPtr] )
... usec = aStopWATCH.stop() # ||||||||||||||||||.stop
... print usec, " [usec]"
>>> aForwardShiftingDependenceLOOP()
57593 [usec]
57879 [usec]
58085 [usec]
>>> def anOriginalForLOOP():
... aStopWATCH.start()
... for n in range( 1, len( sig ) ):
... if ( np.abs( sig[n] ) >= out[n-1] ):
... out[n] = out[n-1] * alpha + ( 1 - alpha ) * np.abs( sig[n] )
... else:
... out[n] = out[n-1] * beta
... usec = aStopWATCH.stop()
... print usec, " [usec]"
>>> anOriginalForLOOP()
164907 [usec]
165674 [usec]
165154 [usec]