python generate segment function

Question

What's the correct method to generate a segmented function with python?

For example:

Just `if`s. I can't see what would be wrong with that. If you wanted constant-time checking on constants, you could use a dictionary. — Carcigenicate, Sep 20 '19 at 19:42
really don't see why the question was downvoted, even if conditionals should be familiar to all developers (python or otherwise) — Liudvikas Akelis, Sep 20 '19 at 19:52
@Carcigenicate How would you use a dict for that? I thought about `range` objects for keys then check `for key in d: if n in key`, but that'd be O(len(keys)), no? — wjandrea, Sep 20 '19 at 19:59
@wjandrea If you were doing something like you'd do in Haskell where you pattern match on a argument to do a piecewise definition, you could do a lookup of the dict. For a more complicated condition though, no, a dict wouldn't be appropriate. — Carcigenicate, Sep 20 '19 at 20:13

wjandrea · Answer 1 · 2019-09-21T13:28:39.897

5

ifs work fine for this. Using a simpler example function here:

def f(n):
    if n in range(0, 128):  # n ∈ [0, 127]
        return n + 6
    elif n in range(128, 896):  # n ∈ [128, 895]
        return 2 * n + 1
    elif n in range(896, 1024):  # n ∈ [896, 1023]
        return 4 * n + 6
    else:
        raise ValueError('n must be in range(0, 1024)')

This is assuming you're using Python 3 and n is an int. Anything else may be relatively slow. See Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?. In other cases, use something like this:

from numbers import Integral

def f(n):
    if not isinstance(n, Integral):  # n ∈ ℤ, as implied by the question
        raise TypeError('n must be an integer')

    if 0 <= n <= 127:  # Literally 0 ≤ n ≤ 127
        return n + 6
    ...

edited Sep 21 '19 at 13:28

answered Sep 20 '19 at 19:50

wjandrea

28,235
9
60
81

1

I believe using `range()` to check this kind of thing can be a huge overhead, because the function actually has to generate an entire range (100s of integers in our case), and check for equality with each one – Liudvikas Akelis Sep 20 '19 at 19:57
1

@Liudvikas That's a good point for Python 2, but I'm assuming Python 3. See [Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?](https://stackoverflow.com/q/30081275/4518341) – wjandrea Sep 20 '19 at 19:58
I stand corrected, didn't know about this functionality! – Liudvikas Akelis Sep 20 '19 at 20:01

score 3 · Accepted Answer · answered Sep 20 '19 at 21:26

I think it's best to directly compare n to the numbers in each segment, like:

def f(n):
    if 0 <= n < 128:  # n ∈ [0, 127]
        return n + 6
    elif 128 <= n < 896:  # n ∈ [128, 895]
        return 2 * n + 1
    elif 896 <= n < 1024:  # n ∈ [896, 1023]
        return 4 * n + 6
    else:
        raise ValueError('n must be in range(0, 1024)')

@wjandrea's answer is very good and there's nothing wrong with that at all, if you prefer that style. However, although range.__contains__ is quick in Python 3, it still requires creating the range object and then discarding it afterward. That has a certain amount of overhead that the if expression doesn't have. Consider that somewhere in range.__contains__ there's still that same if expression, just with more wrapping around it.

timeit bears that out:

> python -m timeit '200 in range(128, 896)'
1000000 loops, best of 3: 0.387 usec per loop

> python -m timeit '128 <= 200 < 896'
10000000 loops, best of 3: 0.0521 usec per loop

Using if x <= n < y is about 7x faster than constructing a range object and querying it.

But we're generally not concerned about microbenchmarks, are we? We're talking about a few microseconds, so raw processing speed probably isn't the most important consideration here. For me, the bigger issue is that if x <= n < y explicitly describing exactly what you mean. Anyone familiar with math immediately understands exactly what that does. range() is not as immediately obvious to the non-programmer: does it include the right side of the interval, or does it not? It doesn't, but while you and I know that, the next person might lose time by having to look that up. It's not as explicit, in my opinion.

Furthermore, the intent of range() is to generate... ranges... that other things might consume. It has the very nice side effect of supporting efficient n in range(...) queries, but that's kind of a happy accident and not the reason it was designed in the first place. To me, it feels kind of wrong to use that as the primary feature of the object.

Again, there's nothing at all wrong with range(). If you love the way that looks, awesome! No one's going to yell at you for it. But -- in my opinion -- it's not the right tool for this job, as there's a simpler, faster, more explicit built-in alternative that was designed for exactly this sort of thing.

So does range: `3.0 in range(2,5)` => `True`. It's even worse in that case, because since 3.0 is not an int, `range.__contains__(3.0)` falls back to `for value in self: if value == 3.0: return True`. I tested with `python -m timeit '895.0 in range(128, 896)'`, which jumped from about 0.4 usec per loop to 33.1 usec per loop. — Kirk Strauser, Sep 20 '19 at 21:59
Actually, I think I misinterpreted you. I read that as "that is a bug", when I think you meant it like "and that's a feature!" — Kirk Strauser, Sep 20 '19 at 22:01
Huh, you're right. So I edited my answer to recommend the explicit comparison more. — wjandrea, Sep 20 '19 at 23:10
@KirkStrauser I was thinking of non-integer floats. `3.1 in range(2, 5) => False`, but you are also correct. — Nicolas Abril, Sep 23 '19 at 18:37

python generate segment function

2 Answers2