strange result from timeit

Question

I tried to repeat the functionality of IPython %time, but for some strange reason, results of testing of some function are horrific.

IPython:

In [11]: from random import shuffle
   ....: import numpy as np
   ....: def numpy_seq_el_rank(seq, el):
   ....:     return sum(seq < el)
   ....:
   ....: seq = np.array(xrange(10000))
   ....: shuffle(seq)
   ....:

In [12]: %timeit numpy_seq_el_rank(seq, 10000//2)
10000 loops, best of 3: 46.1 µs per loop

Python:

from timeit import timeit, repeat

def my_timeit(code, setup, rep, loops):
    result = repeat(code, setup=setup, repeat=rep, number=loops)
    return '%d loops, best of %d: %0.9f sec per loop'%(loops, rep, min(result))

np_setup = '''
from random import shuffle
import numpy as np
def numpy_seq_el_rank(seq, el):
    return sum(seq < el)

seq = np.array(xrange(10000))
shuffle(seq)
'''
np_code = 'numpy_seq_el_rank(seq, 10000//2)'

print 'Numpy seq_el_rank:\n\t%s'%my_timeit(code=np_code, setup=np_setup, rep=3, loops=100)

And its output:

Numpy seq_el_rank:
    100 loops, best of 3: 1.655324947 sec per loop

As you can see, in python i made 100 loops instead 10000 (and get 35000 times slower result) as in ipython, because it takes really long time. Can anybody explain why result in python is so slow?

UPD: Here is cProfile.run('my_timeit(code=np_code, setup=np_setup, rep=3, loops=10000)') output:

         30650 function calls in 4.987 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.987    4.987 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 <timeit-src>:2(<module>)
        3    0.001    0.000    4.985    1.662 <timeit-src>:2(inner)
      300    0.006    0.000    4.961    0.017 <timeit-src>:7(numpy_seq_el_rank)
        1    0.000    0.000    4.987    4.987 Lab10.py:47(my_timeit)
        3    0.019    0.006    0.021    0.007 random.py:277(shuffle)
        1    0.000    0.000    0.002    0.002 timeit.py:121(__init__)
        3    0.000    0.000    4.985    1.662 timeit.py:185(timeit)
        1    0.000    0.000    4.985    4.985 timeit.py:208(repeat)
        1    0.000    0.000    4.987    4.987 timeit.py:239(repeat)
        2    0.000    0.000    0.000    0.000 timeit.py:90(reindent)
        3    0.002    0.001    0.002    0.001 {compile}
        3    0.000    0.000    0.000    0.000 {gc.disable}
        3    0.000    0.000    0.000    0.000 {gc.enable}
        3    0.000    0.000    0.000    0.000 {gc.isenabled}
        1    0.000    0.000    0.000    0.000 {globals}
        3    0.000    0.000    0.000    0.000 {isinstance}
        3    0.000    0.000    0.000    0.000 {len}
        3    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    29997    0.001    0.000    0.001    0.000 {method 'random' of '_random.Random' objects}
        2    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {min}
        3    0.003    0.001    0.003    0.001 {numpy.core.multiarray.array}
        1    0.000    0.000    0.000    0.000 {range}
      300    4.955    0.017    4.955    0.017 {sum}
        6    0.000    0.000    0.000    0.000 {time.clock}

Have you tried profiling the code to see where the slowness occurs? — Soviut, Sep 17 '16 at 00:01
@Soviut, no, i have no idea how to profile timeit.repeat function because it is too complicated for me — KgOfHedgehogs, Sep 17 '16 at 00:04
Is there some reason you're passing a string code block instead of simply running the same test you did in IPython? — Soviut, Sep 17 '16 at 00:30
@Soviut: `ipython`'s `%timeit` magic is doing string eval too IIRC (both of them `compile` it once, then time the run, it's not actually recompiling over and over, nor timing the up front compile). I wouldn't worry about any discrepancies there introducing more than a microsecond or two of overhead. — ShadowRanger, Sep 17 '16 at 00:51

score 3 · Accepted Answer · edited May 23 '17 at 11:53

Well, one issue is that you're misreading the results. ipython is telling you how long it took each of the 10,000 iterations for the set of 10,000 iterations with the lowest total time. The timeit.repeat module is reporting how long the whole round of 100 iterations took (again, for the shortest of three). So the real discrepancy is 46.1 µs per loop (ipython) vs. 16.5 ms per loop (python), still a factor of ~350x difference, but not 35,000x.

You didn't show profiling results for ipython. Is it possible that in your ipython session, you did either from numpy import sum or from numpy import *? If so, you'd have been timing the numpy.sum (which is optimized for numpy arrays and would run several orders of magnitude faster), while your python code (which isolated the globals in a way that ipython does not) ran the normal sum (that has to convert all the values to Python ints and sum them).

If you check your profiling output, virtually all of your work is being done in sum; if that part of your code was sped up by several orders of magnitude, the total time would reduce similarly. That would explain the "real" discrepancy; in the test case linked above, it was a 40x difference, and that was for a smaller array (the smaller the array, the less numpy can "show off") with more complex values (vs. summing 0s and 1s here I believe).

The remainder (if any) is probably an issue of how the code is being evaled slightly differently, or possibly weirdness with the random shuffle (for consistent tests, you'd want to seed random with a consistent seed to make the "randomness" repeatable) but I doubt that's a difference of more than a few percent.

The problem was solved when I has replaced `sum` with `np.sum`. But in IPython session I didin't wirite `from numpy import sum` or `from numpy import *`, I can run tests again and get same results in IPython without those imports. I think may it be Anaconda IPython optimization? — KgOfHedgehogs, Sep 17 '16 at 08:26
@ЮраМахоткин: Possible. I don't know what nonsense Anaconda might engage in. A general replacement of `sum` with `numpy.sum` would be a pessimization for non-`numpy` types, but for all I know they could have written a `sum` wrapper that dynamically chooses which `sum` to use based on the type of the object. Or you may have your `ipython` config, or `PYTHONSTARTUP` file, or whatever, set to import `numpy.sum` automatically, so it's just going to be slow for non-`numpy` types. Have you checked what `sum` actually _is_ in `ipython`? — ShadowRanger, Sep 17 '16 at 13:00

score 1 · Answer 2 · edited Sep 17 '16 at 00:52

1

There could be any number of reasons this code is running slower in one implementation of python than another. One may be optimized differently than another, one may pre-compile certain parts while the other is fully interpreted. The only way to figure out why is to profile your code.

https://docs.python.org/2/library/profile.html

import cProfile

cProfile.run('repeat(code, setup=setup, repeat=rep, number=loops)')

Will give a result similar to

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.000    0.000 <stdin>:1(testing)
     1    0.000    0.000    0.000    0.000 <string>:1(<module>)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}

Which shows you when function calls were made, how many times they were made and how long they took.

edited Sep 17 '16 at 00:52

martineau

119,623
25
170
301

answered Sep 17 '16 at 00:11

Soviut

88,194
49
192
260

And what do the results tell you? To me, it looks like a lot of time it spent converting your `code` block into runnable code. That was one of the things I mentioned about it being interpreted. – Soviut Sep 17 '16 at 00:30
1

@Soviut: Reread it. The string cost is counting everything done within it for `cumtime`, which is not very useful here. Virtually all the `tottime` work is done in the `sum` call. – ShadowRanger Sep 17 '16 at 00:55
1

Also, `ipython` is a wrapper around `python`, it's not a reimplementation. So we're not talking about fundamentally different interpreters here, basically all the real work is being done by the exact same interpreter. – ShadowRanger Sep 17 '16 at 01:02
Good points. In either case, though, profiling should yield answers to where the slowness is occurring. – Soviut Sep 17 '16 at 01:27
No arguments there. :-) – ShadowRanger Sep 17 '16 at 01:29

strange result from timeit

2 Answers2