Strange speeds of `x in range(...)` checks

Question

With r = range(1500, 2500), I benchmarked x in r for x below/in/above the range:

1000 in r :   58 ns ± 0 ns
2000 in r :  101 ns ± 1 ns
3000 in r :   58 ns ± 0 ns

Checking 1000 is faster than checking 2000? Makes sense, as for 1000, Python knows the result after only checking the range's lower bound, whereas for 2000 it needs to check both bounds.

Checking 3000 is faster than checking 2000? Makes sense, as for 3000, Python knows the result after only checking the range's upper bound, whereas for 2000 it needs to check both bounds.

Hey... waaaiiit a minute...

How does it know which bound to check first? How can 1000 and 3000 both be checked faster than 2000?

Benchmark code (Try it online!):

from timeit import repeat
from statistics import mean, stdev

setup = 'r = range(1500, 2500)'

n = 10**4
for _ in range(3):
    for x in 1000, 2000, 3000:
        code = f'{x} in r'
        ts = repeat(code, setup, number=n, repeat=100)
        ts = [t/n * 1e9 for t in sorted(ts)[:10]]
        print(code, ':  %3d ns ± %d ns' % (mean(ts), stdev(ts)))
    print()

It's almost certainly just checking `if 1500 <= x < 2500:`. The Python compiler decides which order to check. — Barmar, Jun 22 '22 at 16:35
@Barmar That explanation disagrees with the times. As discussed in the question. (Also, it's not true.) — Kelly Bundy, Jun 22 '22 at 16:36
What's the value to investigating undefined behavior (insofar as the language doesn't define performance expectations for these operations)? There's no guarantee that the answer will be true in future interpreter releases. It's important to know what the big-O guarantees are on how code will scale, but if you need to know what the trivial constant factors are, it's probably time to move to a language where those _are_ well-defined (even if in the sense of "guaranteed to compile down to, and have performance of, specific assembly instructions"). — Charles Duffy, Jun 22 '22 at 16:43
Python is open-source, you can easily look at how it's implemented. — Barmar, Jun 22 '22 at 16:44
This could answer some questions (or pose new ones): https://github.com/python/cpython/blob/main/Objects/rangeobject.c — fsimonjetz, Jun 22 '22 at 16:44
The [;10] looks fishy. What does that do? Only look at the 10 slowest runs? — Sören, Jun 22 '22 at 16:46
@CharlesDuffy Curiosity, figuring out why the above theory is wrong, and getting reminded about a certain aspect. — Kelly Bundy, Jun 22 '22 at 16:47
@Sören The ten *fastest*. The doc for `repeat` suggests to just take the minimum, but I prefer to look at a few more and their stdev, then I know better whether the machine happened to give me a stable run. — Kelly Bundy, Jun 22 '22 at 16:52
@CharlesDuffy That certain aspect being the step/stride. I suspected I wouldn't be the only one forgetting about that. And judging by the votes for chepner's answer, I believe I was correct in deeming this interesting enough to share. — Kelly Bundy, Jun 22 '22 at 17:52

chepner · Accepted Answer · 2022-06-22T16:59:53.920

8

In CPython, the implementation first checks the value against both endpoints:

if (cmp1 == 1) { /* positive steps: start <= ob < stop */
    cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
    cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}

If either is false, it can exit immediately:

if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
    result = 0;
    goto end;
}

If both checks are true, though, it then has to check if the stride rejects the value. (For example, 2000 in range(1501, 2500) is true, but 2000 in range(1501, 2500, 2) is false.)

tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
    goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
    goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);

It's this last step that makes 2000 in range(1500, 2500) slower than either out-of-bound check.

(See https://github.com/python/cpython/blob/main/Objects/rangeobject.c#L381 for the full implementation.)

edited Jun 22 '22 at 16:59

answered Jun 22 '22 at 16:47

chepner

497,756
71
530
681

I'm surprised that there's no special case for stride=1. – Barmar Jun 22 '22 at 16:48
Maybe the `(ob - start) % stride` for `stride=1` is cheap enough it's not worth adding a branch to avoid – Useless Jun 22 '22 at 16:51
I think the current implementation is simpler (and possibly not significantly slower) than the code needed to determine *which* special case (stride 1 for increasing ranges, stride -1 for decreasing ranges) to check. – chepner Jun 22 '22 at 16:52
Or, such a change isn't a big enough improvement to convince the developers to accept a pull request :) – chepner Jun 22 '22 at 16:54

score 0 · Answer 2 · answered Jun 22 '22 at 16:46

If we look at the CPython implementation, we can see that the range_contains_long function does indeed contain a quick bounds check, specialized for both positive and negative steps.

This explains why the outer cases are both much faster, but of course, it's specific to a single implementation and could change at any time.

Strange speeds of `x in range(...)` checks

2 Answers2