Is it conclusive that now the scipy.misc.comb
is indeed faster than the ad-hoc implementation?
According to an old answer, Statistics: combinations in Python, this homebrew function is faster than scipy.misc.comb
when calculating combinations nCr
:
def choose(n, k):
"""
A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
"""
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n - k) + 1):
ntok *= n
ktok *= t
n -= 1
return ntok // ktok
else:
return 0
But after running some tests on my own machine, this doesn't seem like the case, using this script:
from scipy.misc import comb
import random, time
def choose(n, k):
"""
A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
"""
if 0 <= k <= n:
ntok = 1
ktok = 1
for t in xrange(1, min(k, n - k) + 1):
ntok *= n
ktok *= t
n -= 1
return ntok // ktok
else:
return 0
def timing(f):
def wrap(*args):
time1 = time.time()
ret = f(*args)
time2 = time.time()
print '%s function took %0.3f ms' % (f.__name__, (time2-time1)*1000.0)
return ret
return wrap
@timing
def test_func(combination_func, nk):
for n,k in nk:
combination_func(n, k)
nk = []
for _ in range(1000):
n = int(random.random() * 10000)
k = random.randint(0,n)
nk.append((n,k))
test_func(comb, nk)
test_func(choose, nk)
I get the following output:
$ python test.py
/usr/lib/python2.7/dist-packages/scipy/misc/common.py:295: RuntimeWarning: overflow encountered in exp
vals = exp(lgam(N+1) - lgam(N-k+1) - lgam(k+1))
999
test_func function took 32.869 ms
999
test_func function took 1859.125 ms
$ python test.py
/usr/lib/python2.7/dist-packages/scipy/misc/common.py:295: RuntimeWarning: overflow encountered in exp
vals = exp(lgam(N+1) - lgam(N-k+1) - lgam(k+1))
999
test_func function took 32.265 ms
999
test_func function took 1878.550 ms
Did the time profiling test show that the new scipy.misc.comb
is faster than the ad-hoc choose()
function? Is there any error on my test script that makes the timing inaccurate?
Why is it that the scipy.misc.comb
is faster now? It is because of some cython
/ c
wrapping tricks?
EDITED
After @WarrenWeckesser comment:
Using the default floating point approximation when using scipy.misc.comb()
, the computation breaks because of floating point overflow.
(see http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.misc.comb.html for documentation)
When tested with exact=True
which computes with long integers instead of floating point using the function below, it's a lot slower when computing 1000 combinations:
@timing
def test_func(combination_func, nk):
for i, (n,k) in enumerate(nk):
combination_func(n, k, exact=True)
[out]:
$ python test.py
test_func function took 3312.211 ms
test_func function took 1764.523 ms
$ python test.py
test_func function took 3320.198 ms
test_func function took 1782.280 ms