How to create index combinations (k out of n) as sparse bitmasks for numpy

Question

For numpy how can I efficiently create

an array/matrix representing a list of all combinations (k out of n) as lists of k indices. The shape would be (binomial(n, k), k).
a sparse array/matrix representing this combinations as bitmasks of length n. (So expanding aboves indices to bitmask.) The shape would be (binomial(n, k), n).

I need to do this with large n (and maybe small k). So the algorithm should be

time efficient (e.g. maybe allocate complete result space at once before filling it?)
space efficient (e.g. sparse bitmasks)

Many Thanks for your help.

Have you tried something? `binomial(n, k)` is a factorial, so no matter how fast you try to do this it will take a lot of time and space if the numbers are somewhat big... — jdehesa, Apr 16 '18 at 16:22

sascha · Accepted Answer · 2018-04-16T17:04:43.067

Assuming the blowup is not that bad (as mentioned in the comment above), you might try this. It's pretty vectorized and should be fast (for cases which could be handled).

Edit: i somewhat assumed you are interested in an output based on scipy.sparse. Maybe you are not.

Code

import itertools
import numpy as np
import scipy.sparse as sp

def combs(a, r):
    """
    Return successive r-length combinations of elements in the array a.
    Should produce the same output as array(list(combinations(a, r))), but
    faster.
    """
    a = np.asarray(a)
    dt = np.dtype([('', a.dtype)]*r)
    b = np.fromiter(itertools.combinations(a, r), dt)
    b_ = b.view(a.dtype).reshape(-1, r)
    return b_

def sparse_combs(k, n):
    combs_ = combs(np.arange(n), k)
    n_bin = combs_.shape[0]

    spmat = sp.coo_matrix(( np.ones(n_bin*k),
                            (np.repeat(np.arange(n_bin), k),
                             combs_.ravel()) ),
                            shape=(n_bin, n))
    return spmat


print('dense')
print(combs(range(4), 3))
print('sparse (dense for print)')
print(sparse_combs(3, 4).todense())

Output

dense
[[0 1 2]
 [0 1 3]
 [0 2 3]
 [1 2 3]]
sparse (dense for print)
[[ 1.  1.  1.  0.]
 [ 1.  1.  0.  1.]
 [ 1.  0.  1.  1.]
 [ 0.  1.  1.  1.]]

The helper-function combs i took (probably) from this question (sometime in the past).

Small (unscientific) timing:

from time import perf_counter as pc
start = pc()
spmat = sparse_combs(5, 50)
time_used = pc() - start
print('secs: ', time_used)
print('nnzs: ', spmat.nnz)

#secs:  0.5770790778094155
#nnzs:  10593800

(3, 500)
#secs:  3.4843752405405497
#nnzs:  62125500

Thanks. Using `np.full(n_bin*k, True)` instead of `np.ones(n_bin*k)` gives a boolean result and is slightly faster. Apart from this I haven't found anything faster so far. — saluto, Apr 16 '18 at 19:16
@saluto I think i would prefer ```np.ones(n_bin*k, dtype=bool)``` to the ```np.full()``` approach (feels more natural to me). — sascha, Apr 16 '18 at 19:24
And for `k = 2` and `n >= 12` replacing `combs(...)` with `np.asarray(np.triu_indices(n, 1)).T` gives further performance improvement. — saluto, Apr 16 '18 at 22:31

How to create index combinations (k out of n) as sparse bitmasks for numpy

1 Answers1

Code

Output

Small (unscientific) timing: