Consider this performance test on Ipython under python 3:
Create a range, a range_iterator and a generator
In [1]: g1 = range(1000000)
In [2]: g2 = iter(range(1000000))
In [3]: g3 = (i for i in range(1000000))
Measure time for summing using python native sum
In [4]: %timeit sum(g1)
10 loops, best of 3: 47.4 ms per loop
In [5]: %timeit sum(g2)
The slowest run took 374430.34 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 123 ns per loop
In [6]: %timeit sum(g3)
The slowest run took 1302907.54 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 128 ns per loop
Not sure if I should worry about the warning. The range version timing is vary long (why?), but the range_iterator and the generator are similar.
Now let's use numpy.sum
In [7]: import numpy as np
In [8]: %timeit np.sum(g1)
10 loops, best of 3: 174 ms per loop
In [9]: %timeit np.sum(g2)
The slowest run took 8.47 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.51 µs per loop
In [10]: %timeit np.sum(g3)
The slowest run took 9.59 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 446 ns per loop
g1 and g3 became x~3.5 slower, but the range_iterator g2 is now some ~50 times slower compared to the native sum. g3 wins.
In [11]: type(g1)
Out[11]: range
In [12]: type(g2)
Out[12]: range_iterator
In [13]: type(g3)
Out[13]: generator
Why such a penalty to range_iterator on numpy.sum? Should such objects be avoided? Does it generalized - Do "home made" generators always beat other objects on numpy?
EDIT 1: I realized that the np.sum does not evaluate the range_iterator but returns another range_iterator object. So this comparison is not good. Why doesn't it get evaluated?
EDIT 2: I also realized that numpy.sum keeps the range in integer form and accordingly gives the wrong results on my sum due to integer overflow.
In [12]: sum(range(1000000))
Out[12]: 499999500000
In [13]: np.sum(range(1000000))
Out[13]: 1783293664
In [14]: np.sum(range(1000000), dtype=float)
Out[14]: 499999500000.0
Intermediate conclusion - don't use numpy.sum on non numpy objects...?