23

Given [1,1,4,5,5,6] we can find 4 and 6 to be the non-repeating integers.

There is a solution using XOR.

Here is the algorithm proposed by the author:

#include <stdio.h>
#include <stdlib.h>

/* This finction sets the values of *x and *y to nonr-epeating
 elements in an array arr[] of size n*/
void get2NonRepeatingNos(int arr[], int n, int *x, int *y)
{
  int xor = arr[0]; /* Will hold xor of all elements */
  int set_bit_no;  /* Will have only single set bit of xor */
  int i;
  *x = 0;
  *y = 0;

  /* Get the xor of all elements */
  for(i = 1; i < n; i++)
   xor ^= arr[i];

  /* Get the rightmost set bit in set_bit_no */
  set_bit_no = xor & ~(xor-1);

  /* Now divide elements in two sets by comparing rightmost set
   bit of xor with bit at same position in each element. */
  for(i = 0; i < n; i++)
  {
    if(arr[i] & set_bit_no)
     *x = *x ^ arr[i]; /*XOR of first set */
    else
     *y = *y ^ arr[i]; /*XOR of second set*/
  }
}

I am confused as to what follows after 4^6. I am confused how the set_bit_no works (including the motivation) and whatever after that.

Can someone try to explain it in more plain English manner? Thanks.

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
user423455
  • 773
  • 2
  • 9
  • 19
  • If first bit set in xor was at position k, then all bits at positions higher than k would be different in xor and ~(xor-1) resulting in zero bits and all the bits at positions lower than k would be zero in both xor and ~(xor - 1) resulting in zero. Only bit at position k would be set in both resulting in correct value for set_bit_no. For more details look at my answer. – Mohit Jain Apr 09 '14 at 06:21
  • You should have explained that the numbers come in pairs, but for two of them. –  Apr 09 '14 at 06:35
  • Duplicate of [this](http://stackoverflow.com/questions/11843418/find-the-two-non-repeating-elements-in-an-array-of-repeating-element-xor-operato) – crisron Apr 17 '16 at 01:17

4 Answers4

30

If we have repeated pair of numbers, they would not add anything to xor results, as the xor of them would be zero. Only pair of different number would add non zero bits to xor result.

a xor a = 0
[a, b, c, b, d, a]
a xor b xor c xor b xor d xor a = c xor d

Now in c xor d, the only set bits are the bits that are different in c and d. Let's say 3rd bit is set in c xor d. This means if bit 3 is 0 in c it would be 1 in d or vice versa.

So if we divide all numbers in 2 group, one which contains all numbers with bit 3 is 0, and other in which bit 3 is 1, c and d would definitely go to different groups. And all pairs of same numbers would go the same group. (Bit 3 is either 1 on both a or 0 in both a)

Let's say the groups are

[a c a] and [b d b]
xoring them
a xor c xor a = c (The first number)
b xor d xor b = d (The second number)

Other possibilities of groups are

[c] and [a b d b a]
xoring them
c = c (The first number)
a xor b xor d xor b xor a = d (The second number)

and

[a b c b a] and [d]
xoring them
a xor b xor c xor b xor a= c (The first number)
d = d (The second number)

About

set_bit_no = xor & ~(xor-1);

If input array was composed of natural numbers, xor would be positive xor & ~xor is zero (Definition as all bits are inverted) On subtracting 1 from xor,

  1. If right most bit was zero it would be set to 1 and exit
  2. Reset rightmost bit to zero and try to add 1 to next bit (step 1)

In short all rightmost bits that were 1 would become zero(inverted back similar to xor) and first (rightmost) zero bit would become 1(same as xor). Now on anding, all bits left of this newly set 1 bit are different in xor and ~(xor-1), so they would generate 0, all bits right to this newly set 1 bit are zero in both xor and ~(xor-1) so they would generate 0. Only bit at bit position where 1 was newly set in ~(xor-1) is 1 in both case, so only this bit would be set in expression xor & ~(xor-1)

Mohit Jain
  • 30,259
  • 8
  • 73
  • 100
3

The simple way to explain here is that when you do a^b then only those bit positions are set that have different values in a from b. So if you group elements in the array with their values at particular set bit in a^b then a and b will be in separate groups as xor-ing the group will cancel out others and the result of two groups will be a and b.

Example :-

a = 4 
b = 6 
a^b = 2

Set_bit_pos = 1

Arr = [1,1,4,5,5,6]

Grouping according to bit no 1 

x = [6] where bit1 is 1 

y = [4,1,1,5,5] where bit1 is 0

Xoring 

x = 6 

y = 4^1^1^5^5 = 4
susenj
  • 342
  • 1
  • 4
  • 12
Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19
  • Why not position 2, 3, 4 or whatever? How do we determine the right position to use? – user423455 Apr 09 '14 at 04:42
  • @user423455 Sorry my mistake 5 should be group with 4 because it is b101 so bit1 is 0 not 1. There is no need to take the rightmost set bit but any set bit will do because then x & y will be grouped into different groups and it separate them – Vikram Bhat Apr 09 '14 at 04:48
3

This algorithm would only work if and only if

1) elements are non-zero
2) contains no more than 2 non-repeating integers. If only 1
   non-repeating, one of the result (x or y) will be 0.
3) the repeated numbers occurs in pairs (ie. 2,4,6....)

If 0 is a possible number, then you can't differentiate between an answer found or no answer.

By XORing all the elements, this gives the difference between the 2 non-repeating integers (ie. 4 ^ 6 in your example). This is because all the other elements would be repeating (ie. even amount of times) and in XOR, they cancel themselves out. It is important to note that XOR is commutative (ie. order doesn't matter a^b = b^a)

Now the set_bit_no. This just stores the right most set bit or xor. Why the right most? Because it is easy to get I guess. But any set bit would do. The set bits in xor variable contains the bits where it is different between 4 and 6.

100 ^ 110 = 010

The second bit is 1 because that's the only bit different between 4 and 6. Similarly the difference between 3 and 8

0011 ^ 1000 = 1011

which shows 4th, 2nd and 1st bit are different between 3 and 8.

The reason to get the set bit and use that in the if condition is to make sure the answers (4 and 6) is written to different variable (x or y). Why does this work? Because the set bit guarantees that the 2 answers will contain different values at that bit position.

if(arr[i] & set_bit_no)
anonymous
  • 1,317
  • 8
  • 7
  • 1
    Yes it is meant for natural numbers, but it would work equally good for zero also if pre-condition 2(not more than or less than 2 repeating numbers) is met. If a pair of zero is there, it would be ignored by algo. If one zero is there, it would be properly calculated in this algo. – Mohit Jain Apr 09 '14 at 06:22
  • Like I said, if 0 could be an element in the array, then the array MUST CONTAIN EXACTLY 2 non-repeating integers. Otherwise, if it only contained 1 non-repeating number, the algo will return x or y as 0 and you can't tell whether that 0 is because there is ONE 0 in the array OR only 1 non-zero non-repeating number in the array. Imagine an array [0,0,1]. The algo would return 0 and 1. So this is wrong. – anonymous Apr 09 '14 at 06:39
  • If you doubt pre-condition 2 would be strictly followed in input, you can easily handle the case by counting the occurances of zero. – Mohit Jain Apr 09 '14 at 06:44
  • @MohitJain It is not I who doubt :-). But one needs to clearly define what the output of the function means. ie. When function returns x=1, y=0. What does this mean? x=1 is trivial. Now y=0 is not so trivial. Does it mean "No answer" or "0 is non-repeating"? Now you are suggesting to go through the entire array and count the number of zeros. That is possible but as a user of this function, I expect the function to return a proper and clear result. – anonymous Apr 09 '14 at 06:52
  • @MohitJain I guess a pre-condition of that function is something like "array must contain exactly 2 non-repeating integers". Then returning 0 would clearly mean there is definitely 1 zero in the array. – anonymous Apr 09 '14 at 06:59
  • That's exactly my point. Even if there is a zero in the input, I get a zero in the output. That means function works fine in contrast to your point 1 (Elements are non zero). I hope OP understands the limits, all he is interested in is, how set_bit_no is calculated and how does the things after that work. – Mohit Jain Apr 09 '14 at 09:02
0

When you xor two equal values, they cancel out. This allows to reveal the non-repeating pair.

XOR(aabbccddeeffgh) = XOR(gh) = ...1...

Knowing any bit on which g and h differ allows to set g and h aside in two subsets:

...0... => XOR(aabbcceeg) = g

...1... => XOR(ddffh) = h