Find three numbers appeared only once

Question

In a sequence of length n, where n=2k+3, that is there are k unique numbers appeared twice and three numbers appeared only once.

The question is: how to find the three unique numbers that appeared only once?

for example, in sequence 1 1 2 6 3 6 5 7 7 the three unique numbers are 2 3 5.

Note: 3<=n<1e6 and the number will range from 1 to 2e9

Memory limits: 1000KB , this implies that we can't store the whole sequence.

Method I have tried(Memory limit exceed)：

I initialize a tree, and when read in one number I try to remove it from the tree, if the remove returns false(not found), I add it to the tree. Finally, the tree has the three numbers. It works, but is Memory limit exceed.

I know how to find one or two such number(s) using bit manipulation. So I wonder if

we can find three using the same method(or some method similar)?

Method to find one/two number(s) appeared only once:

If there is one number appeared only once, we can apply XOR to the sequence to find it.

If there are two, we can first apply XOR to the sequence, then separate the sequence into 2 parts by one bit of the result that is 1, and again apply XOR to the 2 parts, and we will find the answer.

How many passes over the data are you allowed? Otherwise it would seem there is a trivial O(n^2) solution using equality (the equality operator is bitwise). — Akusete, Jun 09 '10 at 04:07
This is a math question so far. Show what you have done if you want help. — David Basarab, Jun 09 '10 at 04:07
@shilk: What the hell are you trying to do this for? I want to know some details (like why only one pass, why a memory limit exists etc) before I even think about this. This is homework, isn't it? — Matti Virkkunen, Jun 09 '10 at 05:06
@Matti Virkkunen: No, this is not homework. This is a problem that I found on a online judge website. — shilk, Jun 09 '10 at 05:38
@Matti Virkkunen: Assume good faith; don't automatically attack the OP because you think it might be homework (and even if it was, so what?). @shilk: The thing that bothers me is that you put no thought into your question; you basically just translated the puzzle from the website and put it here. You could have posted what you've tried, why it didn't work, etc. You didn't even ask for advice--you just posted the puzzle. — Sasha Chedygov, Jun 09 '10 at 06:12
And to the close voters: I don't see how this question is either off-topic or not a real question. It's definitely programming related, though it could have been asked in a better way. — Sasha Chedygov, Jun 09 '10 at 06:13
Tagging this question as "homework" or "interview-question" or "programming-competition" or something similar could have helped respondents phrase their answers more appropriately, and would have been more honest in the process. — Jason Hall, Jun 09 '10 at 06:44
You say "My solution involves a tree", but surely that doesn't fall under the "using only bit manipulation" criteria, does it? As such, it's not really a solution is it? — Lasse V. Karlsen, Jun 09 '10 at 07:09
You can not use the same algorithm for one or two numbers for three numbers. If the bit-patterns of the three numbers are: 110, 101 and 011, they will cancel each other out, so you won't know which bit to split on. — Lasse V. Karlsen, Jun 09 '10 at 07:14
@Lasse V. Karlsen: Yeah, this is where I am confused. As far as the memory limit is concerned, I think the only way is using bit manipulation. — shilk, Jun 09 '10 at 07:21
@shilk: the limit in memory does not necessarily mean that you can only view the list once. You could perfectly use a buffer, for example. — Matthieu M., Jun 09 '10 at 13:05
+1 for "Esoteric question + no useful purpose = homework?" Even if it is not homework it lands pretty high up on theory driven mental masturbation scale. — Evan Plaice, Jun 09 '10 at 18:29
@shilk: There is a way to do this only with bit manipulations (XOR etc) and only use O(1) extra space and does it in O(n) time. It makes five passes through the array though. (see my answer). — , Jun 10 '10 at 03:41
@Moron: Your answer is graceful. However, as you said, it makes five passes through the list. As we can't store the whole list, this means that we are allowed to traverse the whole list only once, right? — shilk, Jun 10 '10 at 04:03
@shilk: I was talking about a more general problem where you don't have the silly limits, just like the two unique element problem. Ignore my previous comments about Cam's solution being right. I misread 2e9 as 2^9. — , Jun 10 '10 at 05:43

score 9 · Answer 1 · 2010-06-10T05:30:06.603

For a more general version of this problem (without those silly limits):

You can do this in O(n) time and O(1) space without assuming any bounds, or iterating over all the bits, and using only O(1) time bit manipulation tricks like the XOR trick which worked for 2 missing numbers.

Here is (pseudo)code to find just one of the numbers:

// Given an array arr with 2k+3 numbers, k of which are repeated twice
// and the remaining three are distinct: a,b,c.
// returns one of a,b,c.
int FindUnique(int []arr) {

    int s = 0; // This will ultimately hold a ^ b ^ c (bitwise XOR)

    for (int i = 0; i < arr.Length; i++) {
        s ^= arr[i];
    }

    int d = 0; // this holds diff(a,s) ^ diff(b,s) ^ diff(c,s)

    for (int i = 0; i < arr.Length; i++) {
        d ^= diff(arr[i],s);
    }

    int e = lowestBit(d); // This gives the position where one of a,b,c differs 
                          // from the others.

    int bucket1 = 0;
    int bucket2 = 0;

    for (int i = 0; i < arr.Length; i++) {
        if (arr[i] & e) {
            bucket1 ^= arr[i];
        } else {
            bucket2 ^= arr[i];
        }
    }

    int count1 = 0;
    int count2 = 0;

    for (int i = 0; i < arr.Length; i++) {
        if (arr[i] == bucket1) {
            count1++;
        }

        if (arr[i] == bucket2) {
            count2++;
        }
    }

    if (count1 == 1) return bucket1;

    return bucket2;
}

// return a number with the lowest bit of x ^ s set to 1 and rest 0.
// i.e. the lowest bit position where x and s differ.
int diff(int x, int s) {
    return lowestBit(x ^ s);
}

// Returns a number with only the lowest bit of y set.
int lowestBit(int y) {
    return y & ~(y-1);
}

The idea is as follows:

Say the numbers which appear once are a,b,c.

Now run the XOR through the array to get s = a XOR b XOR c.

Since the numbers are distinct, notice that s cannot be either a or b or c (as the other two will be equal then), thus there is at least one bit (not necessarily at the same position), where each of a,b,c differs from s.

In the two number case, we could see that s is non-zero and pick a bit which differentiated a & b and work with that.

We run into difficulties with that when we have three numbers, but we can still find a bit to differentiate one of the numbers out.

For each number x, find the lowest bit which differs from s. Consider the binary number in which only that bit is set to one and the rest are zero. Call this number diff(x).

Now if we compute diff(x) for each number and XOR them together, we get d = diff(a) XOR diff(b) XOR diff(c).

Notice that d cannot be zero.

Now find the lowest set bit of d. This bit position can be used to bucket out one of a,b,c, as not all of a,b,c can have the same bit at that position: if they did, then that bit of s which is the XOR of those three must be the same, but we ensured that we picked that bit of s to differ from at least one of the corresponding bits in a,b,c.

So we XOR again, differentiating on this bit, and check which of the two resulting numbers appears exactly once in the array. Once we find one number, we know how to deal with two numbers.

To find the diff just use the bithack: x & ~(x-1), which is a standard bit-hack and can be considered O(1) (instead of O(number of bits)).

score 7 · Answer 2 · edited Jun 10 '10 at 05:02

This is a classic question - I was actually just asked it a few weeks ago. To solve it, you take the number of possible distinct numbers that could appear, and allocate that many bits.

For example, if numbers in the list must be between 1-20, you allocate 20 bits - one for each number, and you initialize each bit as 0.

Then you traverse the list. Every time you see a number, flip the corresponding bit.

For example: With your example list of 2 6 3 6 5 7 7, we could allocate 7 bits (for 1 2 3 4 5 6 7). Then as we traverse the list, we would do the following:

flip 2nd bit
flip 6th bit
flip 3rd bit
flip 6th bit
etc

Then once you're done traversing the list, you can read through the bits to find the three unique numbers. They will all be represented by '1' bits, and the other numbers will be represented by 0s.

You read through the list twice, which takes 2n time, which is O(n).

Edit: It is possible that the bounds will not be given. One solution, then, is to simply read through the list first to determine the bounds yourself - then it's still O(n).

However one problem which could occur is that the list could be very small, but some very large numbers - effectively making the range too big. For example:

1, 99999999999999999, 1, 99999999999999999, 2, 3, 4

Solving that problem would require a lot of memory due to the large number present in the list, because even though there are very few numbers, the range is very large and we are allocating bits according to the range.

The solution could then be adjusted to give a new solution as follows using a hashtable (although I'm not sure if this is permitted given the problem's "bit manipulation only" stipulation):

Let L denote the original list, and C denote a copy of it.
Remove all duplicates from C (there are numerous ways of doing this efficiently).
Create a hashtable H, and for each element in C, insert a key/value pair <number,pos> into H where number is the current element in C, and pos is its position in C. So, given a number that appears in L, we can now use H to find that number's position in C.
Allocate a number of bits equal to the size of C, and initialize those bits to 0.
Traverse L. Each time we run accross a number, get its value from H, and flip that bit in our bit list.
Traverse the bit list - for each '1' bit, get the number from C which is at that position - that is one of the unique numbers.

No, but you can determine the bounds for a given list by iterating through the list once and finding the lowest/highest values in it. I will update my answer accordingly. — Cam, Jun 09 '10 at 04:12
Thank you for your reply. I have thought of this method earlier. But the range of numbers is 1-2e9, and n is 3-1e6. So this method will not work. — shilk, Jun 09 '10 at 04:20
@shilk: I've edited my answer accordingly. You can get around that using a hashtable. — Cam, Jun 09 '10 at 04:26
If you use every bit available, you can use about 250 MB of RAM and hold the data. Just allocate range / 8 bytes and when you get an index, use num >> 3 to divide by 8 and get the real index. Then you can use num & 0x3f to find the offset within that byte. If you need to solve such a problem, you probably have a computer with 250 MB of RAM that you can use for a little while. — Jonathan Sternberg, Jun 09 '10 at 04:43
@shilk: 1mb? You can't even store the list in that. And above you mentioned you can only traverse the list once... are you sure you have the bounds/rules right? This is getting pretty ridiculous. — Cam, Jun 09 '10 at 05:06
@Cam: Yeah, I'm fairly sure about the bounds. As we can not store the whole list, we can only read one and manipulate one. — shilk, Jun 09 '10 at 05:22
If you can fit all the values in a hashtable, you can use it directly to solve the problem, no need to add the bit set. — Dave L., Jul 25 '10 at 21:15

Nabb · Accepted Answer · 2010-06-09T19:10:26.063

You can do this in a similar way to the simpler cases of one and two different values.

We need two integers for each bit of the numbers (e.g. 32 bits). For each number, if that bit is zero, XOR the first integer with it. If it isn't, XOR the second integer with it.

Also, keep count of how many times you find a 1 or 0 in each position (we only need to check if this is even or odd, so keep a boolean).

After iterating through, our pairs of integers will be one of the following. The first number here represents an even count, the second an odd.

0, a^b^c
a^b, c
a^c, b
b^c, a

For each pair, check the even count integer. If it is zero, then we know the other integer is a^b^c, since no two of our results will be equal. Otherwise, we've found a value at the odd count integer.

public static int[] find3(int[] list) {
    int[][] xors = new int[32][2];
    boolean[] counts = new boolean[32];
    for (int curr : list) {
        for (int i = 0; i < 32; i++) {
            xors[i][(curr & (1 << i)) >> i] ^= curr;
            counts[i] ^= ((curr & (1 << i)) == (1 << i));
        }
    }

    // this really shouldn't take so many lines
    int[] ret = new int[3];
    int found = 0;
    for (int i = 0; i < 32; i++) {
        int oddCount = xors[i][counts[i] ? 1 : 0];
        int evenCount = xors[i][counts[i] ? 0 : 1];
        if (evenCount != 0) { // avoid the 0, a^b^c case.
            if (found == 0) {
                ret[0] = oddCount;// a
                ret[2] = evenCount;// b^c for now
                found++;
            } else if (found == 1 && ret[0] != oddCount) {
                ret[1] = oddCount;// b
                ret[2] ^= oddCount;// (b^c)^b == c
                break;
            }
        }
    }
    return ret;
}

@shilk. I agree. Good solution for the stated bounds. – Jun 10 '10 at 05:27 — , Jun 10 '10 at 05:27

score 6 · Answer 4 · answered Jun 09 '10 at 07:55

6

If a probabilistic solution will suffice then you could use a Bloom Filter.

Create two Bloom filters. The first (A) contains numbers that have been found at least one, and the second (B) contains numbers that have been found twice.

Pseudocode:

A = empty
B = empty

foreach x in the list
  if x in A
    add x to B
  else
    add x to A

foreach x in the list
  if x in A
    if !(x in B)
      print x

If you use the full 1000KB then the probability of error would be ridiculously low.

answered Jun 09 '10 at 07:55

Peter Alexander

53,344
14
119
168

How can you traverse the list twice as we don't have enough memory to store the whole list? I don't think Bloom Filter will work in this situation. – shilk Jun 09 '10 at 08:48
@shilk: a bloom filter is a glorified bit array, so its extremely compact. You "add" items to bloom filter by setting bits at index `hashcode % array.length` to 1 for several different hash functions, and you test for set membership in a similar way. This is a perfectly adequate, probabilistic solution to your question. – Juliet Jun 09 '10 at 19:52
@Juliet, he's right about the second traversal though. You can't use the Bloom filter to re-traverse the elements, and at the same time we can't store the elements :-/ -- I missed that bit. – Peter Alexander Jun 10 '10 at 07:21

score 1 · Answer 5 · answered Jun 09 '10 at 17:48

The problem gets harder and harder as you add more unique values, mainly because you can choose A,B,C such that A xor B xor C = 0. It gets harder and harder to detect if a subset of the values has the same checksum because it contains all the unique values, or because it omitted values which happened to xor to 0.

You can do 3 values in constant space and O(n*k) time, where k is the number of bits in the largest integer. (So O(n) time for your typical case: 32-bit integers.)

It would be interesting to find out if the time bound becomes non-linear in N as the number of unique values increases and you continue to require constant space.

//Special check for 0, because otherwise we don't know A xor B xor C != A xor B
if items unique-contains 0 then
    return 0 ++ SubProblem2Unique(items - 0)
//Compute A xor B xor C
val x = fold xor items
//Try to find a split which separates A and B from C.
for i in 0..WORD_SIZE
    //see if the checksum splits
    val x1 = fold xor [e in items where e & (1<<i) == 0]
    val x2 = x xor x1
    if x1 == x or x2 == x then continue //ith bit was the same for A and B and C
    //C is either x1 or x2
    val C = if items unique-contains x1 then x1 else x2
    return C ++ SubProblem2Unique(items - C)

throw InvalidInput

score 0 · Answer 6 · answered Jul 01 '10 at 20:59

0

Why not use a a hashset? - If a number already exists, delete from hashset - if a number does not exist, put into hashset The final hashset contains only unique numbers. Time: O(n) Memory:o(k) where k is number of distinct elements.

With hashset approach, the solution is scalable and can be used to determine any number of unique elements in any given sequence.

answered Jul 01 '10 at 20:59

Amit

660
6
8

Because you can't fit half a million 32-bit values into a hashset in 1000 KB. – Dave L. Jul 25 '10 at 21:18

Find three numbers appeared only once

6 Answers6

Linked