0
  • I have multiple two dimensional arrays.
  • I want to find all combinations consisting of one row from each array.

Example:

Let's say array A has rows a0, a1, a2. Let's say array B has rows b0, b1

The six combinations are:

a0-b0, a0-b1, a1-b0, a1-b1, a2-b0, a2-b1

Dash represents concatenation (np.hstack)

How to do that fast for arbitrary number of arrays (say, A, B, C, ...) ?

Fast method for 2 arrays: Combination of all rows in two numpy arrays

Code result for combining 3 arrays:

# input arrays:
[[0 1 2]]
[[ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]
[[17 18 19 20 21 22 23]
 [24 25 26 27 28 29 30]
 [31 32 33 34 35 36 37]]
# output:
[[ 0  1  2  5  6  7  8  9 17 18 19 20 21 22 23]
 [ 0  1  2  5  6  7  8  9 24 25 26 27 28 29 30]
 [ 0  1  2  5  6  7  8  9 31 32 33 34 35 36 37]
 [ 0  1  2 10 11 12 13 14 17 18 19 20 21 22 23]
 [ 0  1  2 10 11 12 13 14 24 25 26 27 28 29 30]
 [ 0  1  2 10 11 12 13 14 31 32 33 34 35 36 37]
 [ 0  1  2 15 16 17 18 19 17 18 19 20 21 22 23]
 [ 0  1  2 15 16 17 18 19 24 25 26 27 28 29 30]
 [ 0  1  2 15 16 17 18 19 31 32 33 34 35 36 37]]

Code:

import numpy as np

def form_combinations(xs):
    tot_size = np.sum([x.shape[1] for x in xs])
    n_rows = [x.shape[0] for x in xs]
    out = np.empty(n_rows + [tot_size])
    n_cols = [x.shape[1] for x in xs]
    cs = np.cumsum([0] + n_cols)
    n = np.newaxis
    out[:, :, :, cs[0]:cs[1]] = xs[0][:, n, n, :]
    out[:, :, :, cs[1]:cs[2]] = xs[1][n, :, n, :]
    out[:, :, :, cs[2]:cs[3]] = xs[2][n, n, :, :]
    out = out.reshape(-1, tot_size)
    return out


def main():
    xs = [
        np.arange(3)[np.newaxis, :],
        np.arange(5, 20).reshape(3, 5),
        np.arange(17, 38).reshape(3, 7)
    ]
    print(xs)

    out = form_combinations(xs)
    print(out)

main()
hamster on wheels
  • 2,771
  • 17
  • 50
  • look at [itertools.combinations](http://docs.python.org/library/itertools.html#itertools.combinations) (you can then chain your results using `itertools.chain.from_iterable`) – Nenri Mar 29 '19 at 15:01
  • why ? it's certainly way faster than your method anyway. Or if you want something from scipy/numpy : [scipy.misc.comb](https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.misc.comb.html) – Nenri Mar 29 '19 at 15:05
  • If you're looking at the code given in the doc, it's (as said above it) a *roughly equivalent* for comprehension. The actual code behind it is way more optimized. – Nenri Mar 29 '19 at 15:09
  • the example code doesn't even make a list (or list comprehension of n_row * n_row * n_row .. iterations). it is numpy array assignment. C vs python loop speed. – hamster on wheels Mar 29 '19 at 15:12
  • So, use C, if you complain about Python. Python isn't at all about speed. If you want speed, you should use a better language for that. – Nenri Mar 29 '19 at 15:14

3 Answers3

1

One way is to use a list comprehension where you just loop over all three arrays ad use hstack to stack them horizontally.

np.array([np.hstack((i, j, k)) for i in a for j in b for k in c])

# array([[ 0,  1,  2,  5,  6,  7,  8,  9, 17, 18, 19, 20, 21, 22, 23],
#        [ 0,  1,  2,  5,  6,  7,  8,  9, 24, 25, 26, 27, 28, 29, 30],
#        [ 0,  1,  2,  5,  6,  7,  8,  9, 31, 32, 33, 34, 35, 36, 37],
#        [ 0,  1,  2, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 23],
#        [ 0,  1,  2, 10, 11, 12, 13, 14, 24, 25, 26, 27, 28, 29, 30],
#        [ 0,  1,  2, 10, 11, 12, 13, 14, 31, 32, 33, 34, 35, 36, 37],
#        [ 0,  1,  2, 15, 16, 17, 18, 19, 17, 18, 19, 20, 21, 22, 23],
#        [ 0,  1,  2, 15, 16, 17, 18, 19, 24, 25, 26, 27, 28, 29, 30],
#        [ 0,  1,  2, 15, 16, 17, 18, 19, 31, 32, 33, 34, 35, 36, 37]])

Timings

%timeit np.array([np.hstack((i, j, k)) for i in a for j in b for k in c])

# 55.1 µs ± 2.61 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Sheldore
  • 37,862
  • 7
  • 57
  • 71
1

For arbitrary number of arrays, an incremental approach:

def combine(xs):
    comb=np.array([[]],int)
    for array in xs:
        left  = repeat(comb,len(array),axis=0)
        right = vstack([array]*len(comb))
        comb  = hstack((left,right))
    return comb
B. M.
  • 18,243
  • 2
  • 35
  • 54
1

Adapted from https://stackoverflow.com/a/49445693/7207392

import numpy as np
import operator as op
import itertools as it

def cartesian_product_pp(arrays, out=None):
    la = len(arrays)
    h, w = zip(*map(op.attrgetter('shape'), arrays))
    w = np.fromiter(it.chain([0], w), int, la+ 1)
    W = w.cumsum()
    h = *h, W[la]
    dtype = np.result_type(*arrays)
    arr = np.empty(h, dtype=dtype)
    arrs = *it.accumulate(it.chain((arr,), it.repeat(0, la-1)), np.ndarray.__getitem__),
    idx = slice(None), *it.repeat(None, la-1)
    for i in range(la-1, 0, -1):
        arrs[i][..., W[i]:W[i+1]] = arrays[i][idx[:la-i]]
        arrs[i-1][1:] = arrs[i]
    arr[..., W[0]:W[1]] = arrays[0][idx]
    return arr.reshape(-1, W[la])

# example
a = np.r_[:3].reshape(1, 3)
b = np.r_[5:20].reshape(3, 5)
c = np.r_[17:38].reshape(3, 7)
p = cartesian_product_pp([a, b, c])

Output:

>>> p
array([[ 0,  1,  2,  5,  6,  7,  8,  9, 17, 18, 19, 20, 21, 22, 23],
       [ 0,  1,  2,  5,  6,  7,  8,  9, 24, 25, 26, 27, 28, 29, 30],
       [ 0,  1,  2,  5,  6,  7,  8,  9, 31, 32, 33, 34, 35, 36, 37],
       [ 0,  1,  2, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 23],
       [ 0,  1,  2, 10, 11, 12, 13, 14, 24, 25, 26, 27, 28, 29, 30],
       [ 0,  1,  2, 10, 11, 12, 13, 14, 31, 32, 33, 34, 35, 36, 37],
       [ 0,  1,  2, 15, 16, 17, 18, 19, 17, 18, 19, 20, 21, 22, 23],
       [ 0,  1,  2, 15, 16, 17, 18, 19, 24, 25, 26, 27, 28, 29, 30],
       [ 0,  1,  2, 15, 16, 17, 18, 19, 31, 32, 33, 34, 35, 36, 37]])

Timings for this, @B.M.'s and @Bazingaa's approaches:

>>> timeit(lambda: cartesian_product_pp([a,b,c]), number=1000)*1000
15.173833002336323
>>> timeit(lambda: combine([a,b,c]), number=1000)*1000
31.1394709860906
>>> timeit(lambda: np.array([np.hstack((i, j, k)) for i in a for j in b for k in c]), number=1000)*1000
51.15771805867553
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99