What's the correct method to generate a segmented function with python?
For example:
if
s work fine for this. Using a simpler example function here:
def f(n):
if n in range(0, 128): # n ∈ [0, 127]
return n + 6
elif n in range(128, 896): # n ∈ [128, 895]
return 2 * n + 1
elif n in range(896, 1024): # n ∈ [896, 1023]
return 4 * n + 6
else:
raise ValueError('n must be in range(0, 1024)')
This is assuming you're using Python 3 and n
is an int
. Anything else may be relatively slow. See Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?. In other cases, use something like this:
from numbers import Integral
def f(n):
if not isinstance(n, Integral): # n ∈ ℤ, as implied by the question
raise TypeError('n must be an integer')
if 0 <= n <= 127: # Literally 0 ≤ n ≤ 127
return n + 6
...
I think it's best to directly compare n
to the numbers in each segment, like:
def f(n):
if 0 <= n < 128: # n ∈ [0, 127]
return n + 6
elif 128 <= n < 896: # n ∈ [128, 895]
return 2 * n + 1
elif 896 <= n < 1024: # n ∈ [896, 1023]
return 4 * n + 6
else:
raise ValueError('n must be in range(0, 1024)')
@wjandrea's answer is very good and there's nothing wrong with that at all, if you prefer that style. However, although range.__contains__
is quick in Python 3, it still requires creating the range
object and then discarding it afterward. That has a certain amount of overhead that the if
expression doesn't have. Consider that somewhere in range.__contains__
there's still that same if
expression, just with more wrapping around it.
timeit
bears that out:
> python -m timeit '200 in range(128, 896)'
1000000 loops, best of 3: 0.387 usec per loop
> python -m timeit '128 <= 200 < 896'
10000000 loops, best of 3: 0.0521 usec per loop
Using if x <= n < y
is about 7x faster than constructing a range
object and querying it.
But we're generally not concerned about microbenchmarks, are we? We're talking about a few microseconds, so raw processing speed probably isn't the most important consideration here. For me, the bigger issue is that if x <= n < y
explicitly describing exactly what you mean. Anyone familiar with math immediately understands exactly what that does. range()
is not as immediately obvious to the non-programmer: does it include the right side of the interval, or does it not? It doesn't, but while you and I know that, the next person might lose time by having to look that up. It's not as explicit, in my opinion.
Furthermore, the intent of range()
is to generate... ranges... that other things might consume. It has the very nice side effect of supporting efficient n in range(...)
queries, but that's kind of a happy accident and not the reason it was designed in the first place. To me, it feels kind of wrong to use that as the primary feature of the object.
Again, there's nothing at all wrong with range()
. If you love the way that looks, awesome! No one's going to yell at you for it. But -- in my opinion -- it's not the right tool for this job, as there's a simpler, faster, more explicit built-in alternative that was designed for exactly this sort of thing.