It is a little tricky, but your inner products of vectors made up of -1
s and 1
s can be converted to XOR-ing and counting non-zero items of vectors made up of 0
s and 1
s. And of course the best container for a 32 item vector of 0
s and 1
s is an uint32
. The following code does the same as what you propose, but running it vectorized in chunks of 2**m
elements for speed
def bitwise_inner(n, m=12):
bitmask = (1 << n) - 1
m = min(m, 2*n)
result = []
chunk = 2**m
for j in xrange(2**(2*n-m)):
start = j * chunk
items = np.arange(start, start+chunk, dtype=np.uint32)
op = (items >> n) ^ (items & bitmask)
# Keep only items with same number of 0s and 1s in the first n bits of op
for k in xrange(n//2 - 1):
# This removes a 1 from the binary representation of each number
op &= op - 1
mask = op != 0
items = items[mask]
op = op[mask]
op &= op - 1
result.append(items[op == 0])
return sum([len(j) for j in result])
First lets check correctness against your code:
def python_inner(n):
survivors = []
for row in itertools.product([-1,1], repeat = 2*n):
if (sum(map(operator.mul, row[0:n], row[n:2*n])) == 0):
survivors.append(row)
return len(survivors)
In [2]: python_inner(8)
Out[2]: 17920
In [3]: bitwise_inner(8)
Out[3]: 17920
And then some timings:
In [4]: %timeit python_inner(8)
1 loops, best of 3: 1.08 s per loop
In [5]: %timeit bitwise_inner(8)
100 loops, best of 3: 3.35 ms per loop
In [6]: %timeit bitwise_inner(12)
1 loops, best of 3: 1.07 s per loop
It is going to still take an awful lot of time to compute all values for n = 16
, but this is nevertheless over two orders of magnitude faster.