4

I am looking for an algorithm to solve the following problem: We are given an integer array of size n which contains k (0 < k < n) many elements exactly once. Every other integer occurs an even number of times in the array. The output should be any of the k unique numbers. k is a fixed number and not part of the input.

An example would be the input [1, 2, 2, 4, 4, 2, 2, 3] with both 1 and 3 being a correct output.

Most importantly, the algorithm should run in O(n) time and require only O(1) additional space.

edit: There has been some confusion regarding whether there is only one unique integer or multiple. I apologize for this. The correct problem is that there is an arbitrary but fixed amount. I have updated the original question above.

"Dante." gave a good answer for the case that there are at most two such numbers. This link also provides a solution for three. "David Eisenstat" commented that it is also possible to do for any fixed k. I would be grateful for a solution.

Community
  • 1
  • 1
Andreas T
  • 733
  • 7
  • 22
  • I'm positive that I have seen this on SO already, because the answer stuck with me. Curiously, I can't find it.... Wait, [this should be it](http://stackoverflow.com/questions/19203868/faster-algorithm-to-find-unique-element-between-two-arrays). – G. Bach Sep 11 '15 at 13:49
  • 2
    @G.Bach Might be this one, no? http://stackoverflow.com/a/7907402/5325238 – Daniel Sep 11 '15 at 13:55
  • @Daniel Actually, I misread the question and your link seems to be the one I should have been looking for. – G. Bach Sep 11 '15 at 14:01
  • 1
    Is it certain that there can be *2* unique elements? – Antti Haapala -- Слава Україні Sep 11 '15 at 14:07
  • 2
    [This one](http://stackoverflow.com/questions/3003021/find-three-numbers-appeared-only-once/3010534#3010534) also seems related. – G. Bach Sep 11 '15 at 14:08
  • 1
    If the number of unique integers is not fixed or at least bounded, then the space complexity must be O(n) in order to hold the result for each unique number. – Fermat's Little Student Sep 11 '15 at 14:19
  • @AndreasT Hopefully,the time and space contraints are met in my answer. – Sumeet Sep 11 '15 at 14:56
  • I did a similar exercise yesterday, and think that you could adapt my solution to fit your needs. http://stackoverflow.com/questions/32510114/remove-duplicates-algorithm-in-place-and-stable-javascript It moves all unique elements to the front of the array. So you could return any of the uniques (if found). I'm not great with space/time analysis but I think the run time is O(n). I do create a hash to keep track of the dupes though, so not sure if that violates your O(1) space requirement – ahnkee Sep 11 '15 at 15:01
  • It's known to be possible if the number of odd-occurring numbers is bounded (in which case this is a duplicate). Otherwise, there's certainly no one-pass algorithm by standard communication complexity arguments (and, I would conjecture, no k-pass algorithm). I think that it's very, very likely that you misheard the question in this case. – David Eisenstat Sep 11 '15 at 15:25
  • @DavidEisenstat When you say bounded, do you mean "at most constantly many, but possibly more than one", i.e. something fixed parameter? I couldn't find an answer that covers that case (and also can't work out one myself). – G. Bach Sep 11 '15 at 22:37
  • @G.Bach Yes. I can't seem to find it and I don't remember exactly how it works, but it was along the lines of compute the XOR of a hash of each element for several different hashes, which is enough information for reconstruction. – David Eisenstat Sep 12 '15 at 00:32
  • @DavidEisenstat Do you remember where you saw this solution? I would be very interested. – Andreas T Sep 12 '15 at 09:49

6 Answers6

10

There is a standard algorithm to solve such problems using XOR operator:

Time Complexity = O(n)

Space Complexity = O(1)

Suppose your input array contains only one element that occurs odd no of times and rest occur even number of times,we take advantage of the following fact:

Any expression having even number of 0's and 1's in any order will always be = 0 when xor is applied.

That is

0^1^....... = 0 as long as number of 0 is even and number of 1 is even 

and 0 and 1 can occur in any order.

Because all numbers that occur even number of times will have their corresponding bits form even number of 1's and 0's and only the number which occurs only once will have its bit left out when we take xor of all elements of array because

0(from no's occuring even times)^1(from no occuring once) = 1 

0(from no's occuring even times)^0(from no occuring once) = 0

as you can see the bit of only the number occuring once is preserved.

This means when given such an array and you take xor of all the elements,the result is the number which occurs only once.

So the algorithm for array of length n is:

 result = array[0]^array[1]^.....array[n-1] 

Different Scenario

As the OP mentioned that input can also be an array which has two numbers occuring only once and rest occur even number of times.

This is solved using the same logic as above but with little difference.

Idea of algorithm:

If you take xor of all the elements then definitely all the bits of elements occuring even number of times will result in 0,which means:

The result will have its bit 1 only at that bit position where the bits of the two numbers occuring only once differ.

We will use the above idea.

Now we focus on the resultant xor bit which is 1(any bit which is 1) and make rest 0.The result is a number which will allow us to differentiate between the two numbers(the required ones).

Because the bit is 1,it means they differ at this position,it means one will have 0 at this position and one will have 1.This means one number when taken AND results in 0 and one does not.

Since it is very easy to set the right most bit,we set it of the result xor as

 A = result & ~(result-1)

Now traverse through the array once and if array[i]&A is 0 store the number in variable number_1 as

 number_1 = number_1^array[i]

otherwise

 number_2 = number_2^array[i]

Because the remaining numbers occur even number of times,their bit will automatically disappear.

So the algorithm is

1.Take xor of all elements,call it xor.

2.Set the rightmost bit of xor and store it in B.

3.Do the following:

number_1=0,number_2=0;
for(i = 0 to n-1)
{
 if(array[i] & B)
  number_1 = number_1^array[i];
 else
  number_2 = number_2^array[i];
}

The number_1 and number_2 are the required numbers.

Sumeet
  • 8,086
  • 3
  • 25
  • 45
  • Can you generalize this to *k* elements that occur only once? – aquinas Sep 11 '15 at 14:56
  • @aquinas No we cannot. At least according to my past lectures on algorithms. – Sumeet Sep 11 '15 at 15:01
  • 1
    You cannot generalize because even for `k=3` you can construct a sequence whose XOR is 0. Example: `{1, 2, 3}`. All bits appears an even number of times even though each number appears once. – fjardon Sep 11 '15 at 15:08
  • @aquinas The OP should have mentioned that. I can assure you that given the time and space constraints,this is the right way. I have done problems like this in past.Actually, we need to remember that not all questions are perfectly specific. – Sumeet Sep 11 '15 at 15:10
  • @fjardon Thats what i said.Anyway:):). – Sumeet Sep 11 '15 at 15:11
  • I don't disagree with you, I'm just saying the problem *as stated* indicates that there can be k unique elements and we need to find *one* of them not all k of them. – aquinas Sep 11 '15 at 15:27
  • @WashingtonGuedes The OP and I have specified that other numbers will occur EVEN number of times. – Sumeet Sep 11 '15 at 17:00
  • @WashingtonGuedes Its ok. – Sumeet Sep 11 '15 at 17:02
  • When I read _to set the right most bit_, what comes to my mind is `A = result | 1;` .. But you've used `A = result & ~(result-1)`.. Sorry my English and if I'm being stupid, but I'm not understanding this line, can you explain in more details? –  Sep 11 '15 at 17:31
  • @WashingtonGuedes It means making all the bits 0 except the rightmost set bit,that is the rightmost bit which is 1.For example in 1110, retaining the rightmost set bit would result in 0010. – Sumeet Sep 11 '15 at 18:02
  • Dude, took me a long while, it is awesome!! but finally I understood it now.. I just disagree with the term _very easy_.. Anyways, good one!! +1 –  Sep 11 '15 at 18:55
  • Sorry for the confusion, Dante.. I have tried to clarify my problem in the original post. Considering a fixed k instead of a variable one, would you know a solution with the described complexity constraints? – Andreas T Sep 12 '15 at 09:48
  • @AndreasT Not if k>2. – Sumeet Sep 12 '15 at 09:51
3

The question title says "the unique integer", but the question body says there can be more than one unique element.

If there is in fact only one non-duplicate: XOR all the elements together. The duplicates all cancel, because they come in pairs (or higher multiples of 2), so the result is the unique integer.

See Dante's answer for an extension of this idea that can handle two unique elements. It can't be generalized to more than that.

Perhaps for k unique elements, we could use k accumulators to track sum(a[i]**k). i.e. a[i], a[i]2, etc. This probably only works for Faster algorithm to find unique element between two arrays?, not this case where the duplicates are all in one array. IDK if an xor of squares, cubes, etc. would be any use for resolving things.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
3

Here's a Las Vegas algorithm that, given k, the exact number of elements that occur an odd number of times, reports all of them in expected time O(n k) (read: linear-time when k is O(1)) and space O(1) words, assuming that "give me a uniform random word" and "give me the number of 1 bits set in this word (popcount)" are constant-time operations. I'm pretty sure that I'm not the first person to come up with this algorithm (and I'm not even sure that I'm remembering all of the refinements), but I've reached the limits of my patience trying to find it.

The central technique is called random restrictions. Essentially what we do is to filter the input randomly by value, in the hope that we retain exactly one odd-count element. We apply the classic XOR algorithm to the filtered array and check the result; if it succeeded, then we pretend to add it to the array, to make it even-count. Repeat until all k elements are found.

The filtration process goes like this. Treat each input word x as a binary vector of length w (doesn't matter what w is). Compute a random binary matrix A of size w by ceil(1 + lg k) and a random binary vector b of length ceil(1 + lg k). We filter the input by retaining those x such that Ax = b, where the left-hand side is a matrix multiplication mod 2. In implementation, A is represented as ceil(1 + lg k) vectors a1, a2, .... We compute the bits of Ax as popcount(a1 ^ x), popcount(a2 ^ x), .... (This is convenient because we can short-circuit the comparison with b, which shaves a factor lg k from the running time.)

The analysis is to show that, in a given pass, we manage with constant probability to single out one of the odd-count elements. First note that, for some fixed x, the probability that Ax = b is 2-ceil(1 + lg k) = Θ(1/k). Given that Ax = b, for all y ≠ x, the probability that Ay = b is less than 2-ceil(1 + lg k). Thus, the expected number of elements that accompany x is less than 1/2, so with probability more than 1/2, x is unique in the filtered input. Sum over all k odd-count elements (these events are disjoint), and the probability is Θ(1).


Here's a deterministic linear-time algorithm for k = 3. Let the odd-count elements be a, b, c. Accumulate the XOR of the array, which is s = a ^ b ^ c. For each bit i, observe that, if a[i] == b[i] == c[i], then s[i] == a[i] == b[i] == c[i]. Make another pass through the array, accumulate the XOR of the lowest bit set in s ^ x. The even-count elements contribute nothing again. Two of the odd-count elements contribute the same bit and cancel each other out. Thus, the lowest bit set in the XOR is where exactly one of the odd-count elements differs from s. We can use the restriction method above to find it, then the k = 2 method to find the others.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Why is the probability that some y!=x satisfies Ay=b less than that term? (2^(-ceil(1 + lg(k)))). And from that, how do you get to the expected number of elements being less than 1/2? – Andreas T Sep 16 '15 at 16:55
  • @AndreasT The probability that Ay=b is the probability that A(y-x) = 0, and y-x is nonzero, so half of the possible rows in A "disqualify" y. The strictness is because x is never disqualified. The expected number of elements is by summing probabilities and observing that the sum is less than k 2^-ceil(1 + lg k) < 1/2. – David Eisenstat Sep 16 '15 at 17:11
  • That makes sense for the probability of Ay=b. For the expected number of elements, do I understand correctly that you XOR all those y with the initial x? That way the non-unique elements which satisfy the equation cancel out themselves. That is why you only have to sum over k, not over n. And you keep repeating this process until the result is x again? – Andreas T Sep 18 '15 at 09:49
  • @AndreasT Right -- there may be many even-occurring elements that satisfy the equation, but they cancel. – David Eisenstat Sep 18 '15 at 12:46
  • Thanks, I get it now. Randomized algorithms are fascinating. This seems to be the most general answer (with k being arbitrary), so I'll mark it as best solution. – Andreas T Sep 18 '15 at 13:30
1

Track the counts for each element and only return the elements with a count of 1. This can be done with a hash map. The below example tracks the result using a hash set while it's still building the counts map. Still O(n) but less efficient, but I think it's slightly more instructive.

Javascript with jsfiddle http://jsfiddle.net/nmckchsa/

function findUnique(arr) {
    var uniq = new Map();
    var result = new Set();
    // iterate through array
    for(var i=0; i<arr.length; i++) {
        var v = arr[i];
        // add value to map that contains counts
        if(uniq.has(v)) {
            uniq.set(v, uniq.get(v) + 1);
            // count is greater than 1 remove from set
            result.delete(v);
        } else {
            uniq.set(v, 1);
            // add a possibly uniq value to the set
            result.add(v);
        }
    }
    // set to array O(n)
    var a = [], x = 0;
    result.forEach(function(v) { a[x++] = v; });
    return a;
}
alert(findUnique([1,2,3,0,1,2,3,1,2,3,5,4,4]));

EDIT Since the non-uniq numbers appear an even number of times @PeterCordes suggested a more elegant set toggle.

Here's how that would look.

function findUnique(arr) {
    var result = new Set();
    // iterate through array
    for(var i=0; i<arr.length; i++) {
        var v = arr[i];
        if(result.has(v)) { // even occurances
            result.delete(v);
        } else { // odd occurances
            result.add(v);
        }
    }
    // set to array O(n)
    var a = [], x = 0;
    result.forEach(function(v) { a[x++] = v; });
    return a;
}

JSFiddle http://jsfiddle.net/hepsyqyw/

Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • 1
    You don't need `uniq` at all. Toggle `v`'s membership in the set when you see it. Elements that appear an even number of times won't be part of the set when you're done. That's still O(n) additional space, since there is no guarantee that the duplicates come in nearby pairs. The question asked for O(1) space complexity. – Peter Cordes Sep 11 '15 at 14:36
  • No hash table guarantees O(1) lookup. – Michael Laszlo Sep 11 '15 at 14:39
  • @MichaelLaszlo: Are you replying to my comment about space? Or are you saying you can't implement a hash table with guaranteed O(1) time lookups for worst-case inputs? I think you can get amortized O(1) lookups by growing the hash table when / if needed. – Peter Cordes Sep 11 '15 at 14:44
  • No, the worst case is O(n) amortized time per operation. – Michael Laszlo Sep 11 '15 at 14:48
  • @MichaelLaszlo: Linear search is O(n) per operation. Hash lookup is O(n) amortized time per *n* operations, aka amortized constant time. – Peter Cordes Sep 11 '15 at 14:50
  • @MichaelLaszlo The amortized `O(n)` time complexity is only for constant size hashtable, which is contradictory to what @PeterCorder stated. I quote: `I think you can get amortized O(1) lookups by growing the hash table when / if needed.` See: http://www.cs.cornell.edu/courses/cs3110/2011sp/lectures/lec20-amortized/amortized.htm – fjardon Sep 11 '15 at 14:52
  • @fjardon: Quoting that link: "the insertion of n elements into an empty array takes only O(n) time in all, including the cost of resizing." That's what I was trying to say earlier. I didn't read the whole page; does it cover worst-case inputs given your choice of hash function? – Peter Cordes Sep 11 '15 at 14:54
  • @PeterCordes Maybe I wasn't clear, (I'm not native speaker), but the page says you're correct in your analysis. If you resize the hashtable, the amortized complexity is `O(1)` per operation. And the page doesn't discuss hash functions but only hashtable in general. – fjardon Sep 11 '15 at 15:03
  • How do you rehash the elements in linear time? That seems to rely on a perfect hash function. – Michael Laszlo Sep 11 '15 at 15:05
  • @MichaelLaszlo: It's quite slow, O(current_size). You do have to re-hash every entry into the new hash table. That's why you have to exponentially increase the size of the hash table when you need to grow it, to avoid having to do this very often. I think the math is: sum(n^-i) is O(n) – Peter Cordes Sep 11 '15 at 15:10
  • As Peter Cordes already said, you are creating a new set which might have to store about n/2 elements in the worst case. So this solution requires Omega(n) additional space. – Andreas T Sep 12 '15 at 08:33
-1

Assuming you have an input array: [2,3,4,2,4] Output: 3

In Ruby, you can do something as simple as this:

[2,3,4,2,4].inject(0) {|xor, v| xor ^ v}
remonses
  • 402
  • 1
  • 4
  • 17
-3
  1. Create an array counts that has INT_MAX slots, with each element initialized to zero.

  2. For each element in the input list, increment counts[element] by one. (edit: actually, you will need to do counts[element] = (counts_element+1)%2, or else you might overflow the value for really ridiculously large values of N. It's acceptable to do this kind of modulus counting because all duplicate items appear an even number of times)

  3. Iterate through counts until you find a slot that contains "1". Return the index of that slot.

Step 2 is O(N) time. Steps 1 and 3 take up a lot of memory and a lot of time, but neither one is proportional to the size of the input list, so they're still technically O(1).


(note: this assumes that integers have a minimum and maximum value, as is the case for many programming languages.)

Kevin
  • 74,910
  • 12
  • 133
  • 166
  • 4
    By that standard of "technically correct", every deterministic program ever written that terminates does so in constant, i.e. O(1) time because it works on finite memory. – G. Bach Sep 11 '15 at 13:58
  • 1
    HA! That's a pretty fuzzy idea for O(1). What if it's a 64 bit integer? You're going to create an array that 2^64 and claim the memory space is constant? I mean, *yes* but...no. :) – aquinas Sep 11 '15 at 13:58
  • Not sure what you mean by that. If Steps 1 and 3 take an hour to complete when N = 10, and an hour to complete when N = 1000000, then it's O(1). – Kevin Sep 11 '15 at 13:59
  • 1
    Indeed the required space is constant in a technical sense, but I doubt that the OP had this in mind. For 32-bit integers, the table would need 4 gigabytes of memory. Most probably the OP imagined some constant space _independent_ of the size of the actual implementation of `int`. – Codor Sep 11 '15 at 13:59
  • What I mean is that step 3 is only in O(1) space if the counts array has constant size - which it can't have if the algorithm has to be correct. – G. Bach Sep 11 '15 at 14:00
  • The counts array size varies with the maximum size of an integer, but that's completely independent from N. – Kevin Sep 11 '15 at 14:01
  • The size of integers is not constant in the RAM model. – G. Bach Sep 11 '15 at 14:02
  • It doesn't need to be constant, it only needs to be independent from N. – Kevin Sep 11 '15 at 14:02
  • 1
    Reductio ad absurdum: If Kevin's argument is correct, then sorting takes O(n) time because you can always use radix sort with an array of size INT_MAX. – Michael Laszlo Sep 11 '15 at 14:02
  • Yes it does need to be constant, otherwise I use an array with MAX_INT + 1 different elements, and your algorithm fails. – G. Bach Sep 11 '15 at 14:03
  • I don't think an array of integers can have MAX_INT+1 different unique values, by definition. – Kevin Sep 11 '15 at 14:04
  • 1
    Then you're talking about the finite state machines that real world computers are, not algorithms in the relevant sense. In that case, everything that can be achieved can be achieved in O(1) time AND space, and the concept becomes irrelevant. – G. Bach Sep 11 '15 at 14:05
  • I don't agree. For instance, my step 2 can't be done in O(1) time. – Kevin Sep 11 '15 at 14:08
  • 1
    With the conception of time and space you're applying to call the size of `counts` constant, yes it can. You can only store constant-sized arrays on your constant-sized memory, and you can iterate constant-sized arrays in constant time. – G. Bach Sep 11 '15 at 14:09
  • But the input array isn't constant-sized, it's dependent on N. – Kevin Sep 11 '15 at 14:09
  • 1
    ...then this isn't O(1) space. – G. Bach Sep 11 '15 at 14:10
  • Agreed, but it's O(1) _additional_ space. – Kevin Sep 11 '15 at 14:11
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/89391/discussion-between-g-bach-and-kevin). – G. Bach Sep 11 '15 at 14:11
  • 1
    You're *technically* correct, but in *practice* the answer doesn't work. I'm trying to sort 64 bit integers. Yes, I can declare an array that is 2^64 long, but I bet I don't have that much ram :). So yes *theoretically* this is correct, but practically...not so much. – aquinas Sep 11 '15 at 14:11
  • 1
    You don't need to count, you just need a `bitmap[UINT32_MAX]` (assuming your inputs are 32bit `int`s, not arbitrary-sized integers like @G.Bach points out.) Toggle the bit for each input number. When you're done that, look for the first set bit. – Peter Cordes Sep 11 '15 at 14:21
  • You are right, you are technically correct, but as other users already wrote, this was not a solution I was looking for. Maybe I should have mentioned that we are dealing with unbounded integers and the space complexity is something like O(log m), where m is the highest number appearing in the array. Thank you for the input though. – Andreas T Sep 12 '15 at 08:30