0

This surprises me a bit. I've been testing performances.

In [1]: import numpy as np

In [2]: %timeit a = np.sum(range(100000))
Out[2]: 100 loops, best of 3: 16.7 ms per loop

In [3]: %timeit a = np.sum([range(100000)])
Out[3]: 100 loops, best of 3: 16.7 ms per loop

In [4]: %timeit a = np.sum([i for i in range(100000)])
Out[4]: 100 loops, best of 3: 12 ms per loop

In [5]: %timeit a = np.sum((i for i in range(100000)))
Out[5]: 100 loops, best of 3: 8.43 ms per loop

I'm trying to understand the inner working as well as learn how to generalize to have a best practice. Why is 4 (building a new generator) is better than 1?

I understand why creating a list takes more time. But again, why 3 is better than 2? And why isn't 2 worse than 1? Is a list being built at 1?

I'm using a from numpy import *.

Matt
  • 74,352
  • 26
  • 153
  • 180
Aguy
  • 7,851
  • 5
  • 31
  • 58
  • 4
    `sum([range(100000)])` is a `TypeError`, so I'm not sure you're making useful comparisons here... – jonrsharpe Jul 08 '16 at 07:53
  • @jonrsharpe. Aha! Your answer lead me to the thoughts that the fact that I have `from numpy import *` has something to do with it... – Aguy Jul 08 '16 at 08:00
  • 2
    And that's why you never use `from thing import *`... Also clarifying which version of Python would be helpful, as `range` is different between 2.x and 3.x. – jonrsharpe Jul 08 '16 at 08:01
  • @Theguy: numpy defines its own `sum` function. – BrenBarn Jul 08 '16 at 08:01
  • @jonrsharpe, this is the default in spyder. I'm on python 3. – Aguy Jul 08 '16 at 08:06
  • regarding the numpy import: see Jake VanderPlas's tweets regarding this: https://twitter.com/jakevdp/status/718540782987116544 & https://twitter.com/jakevdp/status/719524939376668672. Basically, the import is clobbering the builtin `sum()`function. – DocZerø Jul 08 '16 at 08:15

1 Answers1

2

Running the same code, I get these results (Python 3.5.1):

%timeit a = sum(range(100000))
100 loops, best of 3: 3.05 ms per loop

%timeit a = sum([range(100000)])
>>> TypeError: unsupported operand type(s) for +: 'int' and 'range'

%timeit a = sum([i for i in range(100000)])
100 loops, best of 3: 8.12 ms per loop

%timeit a = sum((i for i in range(100000)))
100 loops, best of 3: 8.97 ms per loop

Now with numpy's sum() implementation:

from numpy import sum

%timeit a = sum(range(100000))
10 loops, best of 3: 19.7 ms per loop

%timeit a = sum([range(100000)])
10 loops, best of 3: 20.2 ms per loop

%timeit a = sum([i for i in range(100000)])
100 loops, best of 3: 16.2 ms per loop

%timeit a = sum((i for i in range(100000)))
100 loops, best of 3: 9.27 ms per loop

What's happened is that by using from numpy import * (or from numpy import sum) you're clobbering Python's built-in sum() function.

Have a look at this SO question which discusses performance comparison between the two implementations.

Community
  • 1
  • 1
DocZerø
  • 8,037
  • 11
  • 38
  • 66