5

I recently came across this question, and was unsure how to solve it. I was given an array of positive integers less than a million, and for each number, I had to compute the largest bit difference it had with any other number in the array, where the bit difference between n1 and n2 was defined as the number of set bits in n1 xor n2. So for example, the input

3 (011)
5 (101)
4 (100)

should output the array

3 (3 ^ 4 = 7, which has 3 set bits)
2 (5 ^ 3 = 6, which has 2 set bits)
3 (4 ^ 3 = 7, which has 3 set bits)

where each number in the output array corresponds to the maximum bit difference of all pairs that include the input with the same index in the input array (so for instance, the pair used for the output at the first index must include 3, since 3 is the first number in the input array.) The size of the array can be very large, so an O(N^2) solution where I considered each pair was too slow. I suspected that the solution had to do with exploiting the fact that the numbers must be less than a million, since that would mean each number only has 20 bits. However, I was unable to proceed with this idea.

  • [Possibly of interest](https://stackoverflow.com/a/109025/2166798). – pjs Mar 26 '23 at 19:30
  • 4
    Hint: try using a bit-wise trie. – Ian Mercer Mar 26 '23 at 19:36
  • `(4^3) == (3^4)`, you've omitted `4^5` which only has 1 set bit. – pjs Mar 26 '23 at 19:38
  • @pjs The fact that 4^5 only has 1 set bit means it is not a maximum for either the element 4 or the element 5. That's why it's not included in the output. – kaya3 Mar 26 '23 at 19:39
  • @pjs Right. The idea is that for each number in the output array, it corresponds to the maximum bit difference of all pairs that include the number with the same index in the input array. – tangerinescorpion Mar 26 '23 at 19:40
  • @IanMercer could you elaborate? The idea of tries had crossed my mind since I remember doing a problem similar to this one that had one of its solutions use them, but the truth is I am not too experienced with them. – tangerinescorpion Mar 26 '23 at 20:11
  • I elaborated for @IanMercer. – btilly Mar 26 '23 at 21:37

1 Answers1

2

Ian Mercer is both clever and right. I'll explain why it works to have a binary tree where each node you go left/right depending on whether you have a 0 or 1.

Exploring the tree is always fewer operations than brute force. But the brute force operations are extremely fast xor operations which run quickly. Where the tree wins is that on a long list you probably don't explore very much of it, while brute force does work proportional to the square of the size of the list.

Let's build some intuition into this. Two random numbers in this range are almost two random collections of 20 bits. Their bitwise difference is therefore the sum of 20 coin flips. We can easily calculate a binomial distribution (the probability of m bit differences is (20 choose m) / 2^20) and we find that the odds of differences go:

0 differences = 0.000000956...
1 differences = 0.000019073...
2 differences = 0.000181198...
3 differences = 0.001087188...
4 differences = 0.004620552...
5 differences = 0.014785766...

So if our list has more than 5500 elements, we've got a good chance of only exploring the parts of the tree with at most 2 differences, and that's not much.

If it has more than 1000 elements, odds are that we don't have to look at more than 3 differences. This is probably still better than brute force. When we DO have to look at differences of 4, my back of the envelope guess is that brute force would have been better. But we can be clever and start with the parts of the 4 bit differences in the tree where the third difference was as late as possible. So we won't have to look at much of the 4 bit difference part of the tree.

If the list has under 200 elements we've probably got to explore the tree out to 5+ differences. Brute force was likely faster. (xor is incredibly fast!) But now the whole tree isn't that big so we don't really care about performance.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • So wait, I am a little confused on how this would work. Would you mind writing some code describing this approach? Is this of any relevance: https://www.geeksforgeeks.org/find-pair-rows-binary-matrix-maximum-bit-difference/? – tangerinescorpion Mar 26 '23 at 23:29
  • @tangerinescorpion I don't want to write code because it IS an interview question. But essentially you start with a `todo` list of the root of the node. Then for each node in `todo`, you go down the path of bit flips that leads the number you're looking at, putting the differences into a new `todo` of 1 bit difference. Repeat to find any number with a 1 bit difference while setting up the 2 bit difference search. Repeat again to find 2 bit differences while setting up the 3 bit difference search. And repeat. – btilly Mar 27 '23 at 08:09