How to speed up a large number of inner products

Question

I want to iterate over all 2^32 rows made up of -1s and 1s and perform an inner product operation on each one. Is there a way to speed up the code below?

import itertools
import operator

n = 16

survivors = []
for row in itertools.product([-1,1], repeat = 2*n):
    if (sum(map(operator.mul, row[0:n], row[n:2*n])) == 0):
        survivors.append(row)
print len(survivors)

I'd use a built-in for starters: [inner](http://docs.scipy.org/doc/numpy/reference/generated/numpy.inner.html). Also avoid appending to a list whenever possible for better speed. — Matthew Turner, Apr 29 '14 at 15:59
@mattyTpain Using inner it is 5 times slower, at least when I tested n=10. — Simd, Apr 29 '14 at 16:03
You can pre-initialize a numpy array since you know the length survivors will be: http://stackoverflow.com/questions/4535374/initialize-a-numpy-array. This pre-allocates the space needed ahead of time for `survivors` — Matthew Turner, Apr 29 '14 at 16:03

score 1 · Accepted Answer · answered Apr 29 '14 at 17:51

It is a little tricky, but your inner products of vectors made up of -1s and 1s can be converted to XOR-ing and counting non-zero items of vectors made up of 0s and 1s. And of course the best container for a 32 item vector of 0s and 1s is an uint32. The following code does the same as what you propose, but running it vectorized in chunks of 2**m elements for speed

def bitwise_inner(n, m=12):
    bitmask = (1 << n) - 1
    m = min(m, 2*n)
    result = []
    chunk = 2**m
    for j in xrange(2**(2*n-m)):
        start = j * chunk
        items = np.arange(start, start+chunk, dtype=np.uint32)
        op = (items >> n) ^ (items & bitmask)
        # Keep only items with same number of 0s and 1s in the first n bits of op
        for k in xrange(n//2 - 1):
            # This removes a 1 from the binary representation of each number
            op &= op - 1
            mask = op != 0
            items = items[mask]
            op = op[mask]
        op &= op - 1
        result.append(items[op == 0])
    return sum([len(j) for j in result])

First lets check correctness against your code:

def python_inner(n):
    survivors = []
    for row in itertools.product([-1,1], repeat = 2*n):
        if (sum(map(operator.mul, row[0:n], row[n:2*n])) == 0):
            survivors.append(row)
    return len(survivors)

In [2]: python_inner(8)
Out[2]: 17920

In [3]: bitwise_inner(8)
Out[3]: 17920

And then some timings:

In [4]: %timeit python_inner(8)
1 loops, best of 3: 1.08 s per loop

In [5]: %timeit bitwise_inner(8)
100 loops, best of 3: 3.35 ms per loop

In [6]: %timeit bitwise_inner(12)
1 loops, best of 3: 1.07 s per loop

It is going to still take an awful lot of time to compute all values for n = 16, but this is nevertheless over two orders of magnitude faster.

How to speed up a large number of inner products

1 Answers1