Why is creating a range from 0 to log(len(list), 2) so slow?

Question

I don't have a clue why is this happening. I was messing with some lists, and I needed a for loop going from 0 to log(n, 2) where n was the length of a list. But the code was amazingly slow, so after a bit a research I found that the problem is in the range generation. Sample code for demonstration:

n = len([1,2,3,4,5,6,7,8])
k = 8
timeit('range(log(n, 2))', number=2, repeat=3) # Test 1
timeit('range(log(k, 2))', number=2, repeat=3) # Test 2

The output

2 loops, best of 3: 2.2 s per loop
2 loops, best of 3: 3.46 µs per loop

The number of tests is low (I didn't want this to be running more than 10 minutes), but it already shows that range(log(n, 2)) is orders of magnitude slower than the counterpart using just the logarithm of an integer. This is really surprising and I don't have any clue on why is this happening. Maybe is a problem on my PC, maybe a Sage problem or a Python bug (I didn't try the same on Python).

Using xrange instead of range doesn't help either. Also, if you get the number with .n(), test 1 runs at the same speed of 2.

Does anybody know what can be happening? Thanks!

Sounds like a Sage (maybe cython?) problem. Python `range` doesn't even take floats. — Pavel Anossov, Apr 29 '13 at 22:37
And Python also doesn't have `log` in the global namespace (and no way to get it there without adding a `setup` to `timeit`). And `n` isn't available to `timeit` either. And there's no `repeat` parameter on `timeit` (which I assume you got with `from timeit import timeit`). — abarnert, Apr 29 '13 at 22:43
Doesn't your output rather display that the values your `timeit` returns are rather random? After all you tried the same thing twice (both `n` and `k` are 8), and got massively varying results. — Alfe, Apr 29 '13 at 22:44
"Also, if you get the number with `.n()`". Wait, what? Get _what_ number, from where? AFAIK, Sage is built on top of ipython, and all of its "magic" syntax starts with `%` or `!`. — abarnert, Apr 29 '13 at 22:48
@abarnert: `.n()` is a method of Sage objects which produces a numerical value, typically a `RealNumber` instance. — DSM, Apr 29 '13 at 23:57
@DSM: OK. So `n` isn't really a number, but some kind of expression that gets passed around and evaluated lazily as needed (possibly so it can be optimized at the site of strict evaluation or something?), and `n.n()` forces strict evaluation? — abarnert, Apr 30 '13 at 00:05
@abarnert: well, here `n` is simply `int(8)`. I think the OP was referring to calling the `.n()` method on the output of one of the expressions to get it out of SR (the symbolic ring where expressions like `log(8)/log(2)` live). — DSM, Apr 30 '13 at 00:13
@abarnert as @DSM says, the `.n()` method is a Sage-specific method which takes a float out of an expression like `log(n, 2)`. I should have clarified in the question that this question is specific to Sage (the sage tag was ignored in the title). — gjulianm, Apr 30 '13 at 10:06

score 13 · Accepted Answer · answered Apr 29 '13 at 23:56

Good grief -- I recognize this one. It's related to one of mine, trac #12121. First, you get extra overhead from using a Python int as opposed to a Sage Integer for boring reasons:

sage: log(8, 2)
3
sage: type(log(8, 2))
sage.rings.integer.Integer
sage: log(8r, 2)
log(8)/log(2)
sage: type(log(8r, 2))
sage.symbolic.expression.Expression
sage: %timeit log(8, 2)
1000000 loops, best of 3: 1.4 us per loop
sage: %timeit log(8r, 2)
1000 loops, best of 3: 404 us per loop

(The r suffix means "raw", and prevents the Sage preparser from wrapping the literal 2 into Integer(2))

And then it gets weird. In order to produce an int for range to consume, Sage has to figure out how to turn log(8)/log(2) into 3, and it turns out that she does the worst thing possible. Plagiarizing my original diagnosis (mutatis mutandis):

First she checks to see if this object has its own way to get an int, and it doesn't. So she builds a RealInterval object out of log(8)/log(2), and it turns out that this is about the worst thing she could do! She checks to see whether the lower and upper parts of the interval agree [on the floor, I mean] (so that she knows for certain what the floor is). But in this case, because it really is an integer! this is always going to look like:

sage: y = log(8)/log(2)
sage: rif = RealIntervalField(53)(y)
sage: rif
3.000000000000000?
sage: rif.endpoints()
(2.99999999999999, 3.00000000000001)

These two bounds have floors which aren't aren't equal, so Sage decides she hasn't solved the problem yet, and she keeps increasing the precision to 20000 bits to see if she can prove that they are.. but by construction it's never going to work. Finally she gives up and tries to simplify it, which succeeds:

sage: y.simplify_full()
3

Proof without words that it's a perverse property of the exactly divisible case:

sage: %timeit range(log(8r, 2))
1 loops, best of 3: 2.18 s per loop
sage: %timeit range(log(9r, 2))
1000 loops, best of 3: 766 us per loop
sage: %timeit range(log(15r, 2))
1000 loops, best of 3: 764 us per loop
sage: %timeit range(log(16r, 2))
1 loops, best of 3: 2.19 s per loop

Nice diagnosis... are you going to write a patch for it now? :) — kcrisman, Apr 30 '13 at 00:42

score 1 · Answer 2 · edited May 23 '14 at 06:50

Python 2 allows range(some_float), but its deprecated and doesn't work in python 3.

The code sample doesn't give the output specified. But we can walk through it. First, timeit needs a full script, the import in the script calling timeit is not used:

>>> timeit('range(log(8,2))')
  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/timeit.py", line 226, in timeit
    return Timer(stmt, setup, timer).timeit(number)
  File "/usr/lib64/python2.6/timeit.py", line 192, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
NameError: global name 'log' is not defined

If you add the import to the script being timed, it includes the setup time:

>>> timeit('from math import log;range(log(8,2))')
3.7010221481323242

If you move the import to the setup, its better, but timing a one-shot is notoriously inaccurate:

>>> timeit('range(log(8,2))',setup='from math import log')
1.9139349460601807

Finally, run it a bunch of times and you get a good number:

>>> timeit('range(log(8,2))',setup='from math import log',number=100)
0.00038290023803710938

It looks like the Sage notebook imports `log` into globals. (And the OP is already using `number` on his `timeit`.) — abarnert, Apr 29 '13 at 22:52

score 1 · Answer 3 · answered Apr 29 '13 at 22:58

This looks like it's a Sage bug.

I created a new notebook and did this:

n = len([1,2,3,4,5,6,7,8])
k = 8
timeit('range(log(n, 2))', number=2, repeat=3) # Test 1
timeit('range(log(len([1,2,3,4,5,6,7,8]), 2))', number=2, repeat=3) # Test 1.5
timeit('range(log(k, 2))', number=2, repeat=3) # Test 2

Test 1.5 is just as slow as test 1. But if you break it down in any way—take off the range, or even add m=n+0 and use m instead of n, it drops down to microseconds.

So clearly, Sage is trying to do something complicated here while evaluating the expression, and getting confused.

To verify this, in plain old ipython:

n = len([1,2,3,4,5,6,7,8])
k = 8
%timeit 'range(log(n, 2))'
%timeit 'range(log(len([1,2,3,4,5,6,7,8]), 2))'
%timeit 'range(log(k, 2))'

They're all equally fast, as you'd expect.

So… what do you do about it?

Well, you may want to try to track down the Sage bug and file it upstream. But meanwhile, you probably want a workaround in your code.

As noted above, just doing m = n+0 and using m instead of n seems to speed it up. See if that works for you?

Yes, the workaround was as easy as computing the value of the logarithm with `log(n, 2).n()`. This converts the expression `log(8)/log(2)` to a number (3) thus speeding execution. — gjulianm, Apr 30 '13 at 10:10

Alfe · Answer 4 · 2013-04-29T22:57:00.427

-1

Maybe using log(x, 2) (aka ld()) isn't a good idea in the first place. I'd propose to use shifting the int values to implement the ld():

n = len(array)
while n:
  n >>= 1
  # perform the loop stuff

This way you might avoid all these uglinesses with the range() and the log().

In normal situations calling log() should take more time than simple bit shifting on an int. Examples:

>>> timeit('for i in range(int(math.log(8, 2))): pass', setup='import math')
0.6762251853942871
>>> timeit('n = 8\nwhile n:\n  n >>= 1')
0.24107813835144043

With larger values for n the difference gets smaller. For n = 10000 I got 0.8163230419158936 and 0.8106038570404053, but that should be because then the loop body will take the majority of the time, compared to the loop initialization.

edited Apr 29 '13 at 22:57

answered Apr 29 '13 at 22:48

Alfe

56,346
20
107
159

I'm willing to be that this writing a right loop around bit-shifting in Python will be significantly slower than calling `log`. But whether I'm right or wrong, you definitely shouldn't be asserting that it's faster without even attempting to test. – abarnert Apr 29 '13 at 22:49
Maybe you just shouldn't assume I didn't test. That bit shifting isn't slower than additions or other simple arithmetics in today's processors is part of my basic knowledge. But I added some test to prove my point. – Alfe Apr 29 '13 at 22:57
OK, so you timed apples and oranges, rather than not timing anything. The first version is not even remotely equivalent to the second, because it has an extra `for i in range(…)` loop that the other does not (which kills all of the advantage of not having a tight loop in Python). – abarnert Apr 29 '13 at 23:00
The other has the `while` loop as replacement, as my answer proposes. – Alfe Apr 29 '13 at 23:01
Try `from math import log` timings – jamylak Apr 29 '13 at 23:08
I'm not sure what exactly you mean. In case you think that the import statement is so costly that it distorts the timing measurement: `timeit()` does not take the `setup` code into the timing. Whatever time is consumed by importing `log` is not taken into account. – Alfe Apr 29 '13 at 23:12
Well, there are some drawbacks using bitshift: 1) I can't use it in list comprehension. 2) It's more obscure, you know inmediately what `range(log(n, 2))` does when you see it. 3) This is Sage, not only python (I should have clarified it better) so it's actually better to use `log(n, 2)`, as n could be of another type other than int, which probably doesn't have a bit-shift operation. – gjulianm Apr 30 '13 at 10:24
① Yes you can; just create a generator yielding the shifted n. (`def foo(n): while n: yield n; n >>= 1`) ② I had the feeling that the origin of what you were doing was to do something for as long as there are counting binary digits in that n (and therefore using that `ld()`). If you needed that `log(x, 2)` not in that context, you are right (but then the task behind that would be interesting). ③ Sizes of lists are always natural numbers (you can't have 1.4 elements in a list) which always can be represented as binary numbers which in turn always can be shifted bitwise (correct me). – Alfe Apr 30 '13 at 11:52
But anyway, I never meant: "Do it like this." I only wanted to point out a different view on what you were doing. Whether it fits your needs or not, I cannot decide. – Alfe Apr 30 '13 at 11:53

Why is creating a range from 0 to log(len(list), 2) so slow?

4 Answers4

Linked