A better suggestion
I think a better vectorized approach would be with slicing -
(series[slen:2*slen] - series[:slen]).sum()/float(slen**2)
Runtime test and verification -
In [139]: series = np.random.randint(11,999,(200))
...: slen= 66
...:
# Original app
In [140]: %timeit get_trend(series, slen)
100000 loops, best of 3: 17.1 µs per loop
# Proposed app
In [141]: %timeit (series[slen:2*slen] - series[:slen]).sum()/float(slen**2)
100000 loops, best of 3: 3.81 µs per loop
In [142]: out1 = get_trend(series, slen)
In [143]: out2 = (series[slen:2*slen] - series[:slen]).sum()/float(slen**2)
In [144]: out1, out2
Out[144]: (0.7587235996326905, 0.75872359963269054)
Investigating comparison on average based approach against loopy one
Let's add the second approach (vectorized one) from the question for testing -
In [146]: np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
Out[146]: 0.75872359963269054
Timings are better than the loopy one and results look good. So, I am suspecting the way you are timing might be off.
If you are using NumPy
ufuncs to leverage the vectorized operations with NumPy
, you should work with arrays. So, if your data is a list, convert it to an array and then use the vectorized approach. Let's investigate it a bit more -
Case #1 : With a list of 200
elems and slen = 66
In [147]: series_list = np.random.randint(11,999,(200)).tolist()
In [148]: series = np.asarray(series_list)
In [149]: slen = 66
In [150]: %timeit get_trend(series_list, slen)
100000 loops, best of 3: 5.68 µs per loop
In [151]: %timeit np.asarray(series_list)
100000 loops, best of 3: 7.99 µs per loop
In [152]: %timeit np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
100000 loops, best of 3: 6.98 µs per loop
Case #2 : Scale it 10x
In [157]: series_list = np.random.randint(11,999,(2000)).tolist()
In [159]: series = np.asarray(series_list)
In [160]: slen = 660
In [161]: %timeit get_trend(series_list, slen)
10000 loops, best of 3: 53.6 µs per loop
In [162]: %timeit np.asarray(series_list)
10000 loops, best of 3: 65.4 µs per loop
In [163]: %timeit np.average(np.subtract(series[slen:2*slen], series[:slen]))/float(slen)
100000 loops, best of 3: 8.71 µs per loop
So, it's the overhead of converting to an array that's hurting you!
Investigating comparison on sum based approach against average based one
On the third part of comparing sum-based
code against average-based
one, it's because np.avarege
is indeed slower than "manually" doing it with summation
. Timing it on this as well -
In [173]: a = np.random.randint(0,1000,(1000))
In [174]: %timeit np.sum(a)/float(len(a))
100000 loops, best of 3: 4.36 µs per loop
In [175]: %timeit np.average(a)
100000 loops, best of 3: 7.2 µs per loop
A better one than np.average
with np.mean
-
In [179]: %timeit np.mean(a)
100000 loops, best of 3: 6.46 µs per loop
Now, looking into the source code for np.average
, it seems to be using np.mean
. This explains why it 's slower than np.mean
as we are avoiding the function call overhead there. On the tussle between np.sum
and np.mean
, I think np.mean
does take care of the overflow in case we are adding a huge number of elements, which we might miss it with np.sum
. So, for being on the safe side, I guess it's better to go with np.mean
.