Consider a particular bit (say the 10th). Suppose the arrays are of length 100, and there's 19 elements with the 10th bit set in A
, and 22 elements with the 10th bit set in B
. How many elements of the set (A[i]^B[j] for i=1..N, for j=1..M)
will have the 10th bit set? Well, it needs either the bit set in A[i]
and not in B[j]
or vice versa. So there's 19*(100-22) + (100-19)*22 elements with the 10th bit set. This calculation suggests we can perform the sum bit-by-bit efficiently.
More precisely, suppose you have 32-bit ints. For each i in 0..31, let's count the number of elements of A and B with that bit set. Let's say we have a[i] elements in A, and b[i] elements in B with the i'th bit set.
Using the same idea as above, the sum of the xor of these i'th bits will contribute (a[i]*(len(B)-b[i]) + (len(A)-a[i])*b[i]) << i
to the overall result.
This gives you a straightforward O((N+M)k) solution (where k is the largest number of bits in any int in your array).
Here's some Python code that implements this idea, including some randomised tests against the naive version:
def sum_xor(A, B):
s = 0
for i in xrange(32):
ai = sum((a >> i) & 1 for a in A)
bi = sum((b >> i) & 1 for b in B)
s += (ai*(len(B)-bi) + (len(A)-ai)*bi) << i
return s
def sum_xor_slow(A, B):
s = 0
for a in A:
for b in B:
s += a^b
return s
import random
all_ok = True
for trials in xrange(100):
A = [random.randrange(1<<32) for _ in xrange(random.randrange(100, 110))]
B = [random.randrange(1<<32) for _ in xrange(random.randrange(100, 110))]
x0 = sum_xor(A, B)
x1 = sum_xor_slow(A, B)
ok = x0 == x1
all_ok = all_ok and ok
print 'OK' if ok else 'FAIL', x0, x1
assert all_ok
A pragmatic note: it will probably be faster to iterate once over A and B, and accumulate the 32 bit counts all at once in two arrays, because that minimizes memory reads. (And indeed, this is how the first version of my code worked). But I changed the code to the above because it's a lot simpler and has the same complexity.