How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

Question

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...

The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.

It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.

Are the numbers in the array consecutive integers? It seems none of the answers so far are better than ((log N)^2), but also none directly uses the sorted nature of the array. — Pete Kirkham, Jul 05 '10 at 17:39
Cool problem! Where do they assign such nice problems as a homework? :) — Kris, Jul 05 '10 at 18:38
There's a great answer below by "throwawayacct", which seems to show (I can't find any flaw in the proof) that any algorithm is Ω((log n)^2). — ShreevatsaR, Jul 05 '10 at 23:33
http://www.technicalinterviewquestions.net/2009_01_01_archive.html - not the solution to the above, but an interesting solution in O(n) time with O(1) storage. — Will A, Jul 05 '10 at 23:33
@Will A: O(N) time and O(1) solution is trivial, I can detect kth odd number with such requirements. — Matthieu M., Jul 06 '10 at 07:05
@AGeek Please post the solution after the acceptance period for Your assignment ends. Thx. — Dave O., Jul 06 '10 at 10:59
@Bragboy will be difficult if all questions are as hard to answer as this one^^. — Dave O., Jul 06 '10 at 11:56
@joe snyder and anybody else: A solution to the problem that runs in O(log(n)^2) is trivial. A solution that even runs in O(log(n)) is yet to be found(!) - or a proof that there is no such algorithm. — Dave O., Jul 09 '10 at 16:07
@Dave: the desire for a faster search was clear; i just thought the sort-agnostic XOR was worth noting for its cleverness, although Will A actually ref'd it first. — joe snyder, Jul 10 '10 at 02:35
@Dave: I thought through the matter and I am convinced that user "throwawayacct" has a complete proof that you cannot do better than O(log(n)^2). You might as well just accept his solution because he did the hard part. — Greg Kuperberg, Jul 11 '10 at 21:17
@Greg: I am still not convinced with the proof (but that is probably more my issue than an issue with the proof). For instance the ability to guarantee Omega(logn) time for _every_ boundary finding seems to be lacking. The claim that one boundary finding is 'independent' from the other seems to require more justification, as the past history of compares done by the algorithm does help narrow down the future boundaries, but that seems to be ignored in the proof. — , Jul 12 '10 at 07:00
I am convinced now after the rewrite by throwawayacct and supporting answer from Greg, fwiw. — , Jul 12 '10 at 17:57
@AGeek we're still curious about the solution/Your prof's opinion on the SO community's answers! Please don't keep us in the dark ;) — Dave O., Jul 12 '10 at 21:29

user382751 · Answer 1 · 2010-07-12T16:47:38.697

Theorem: Every deterministic algorithm for this problem probes Ω(log² n) memory locations in the worst case.

Proof (completely rewritten in a more formal style):

Let k > 0 be an odd integer and let n = k². We describe an adversary that forces (log₂ (k + 1))² = Ω(log² n) probes.

We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x₁ x₂ … x_k. For each segment x_j, there exists an integer b_j ∈ [0, k] such that x_j consists of b_j copies of j - 1 followed by k - b_j copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.

Group boundaries
|   |     |   |   |
 0 0 1 1 1 2 2 3 3
|     |     |     |
Segment boundaries

Wherever there is an increase of two, we assume a double boundary by convention.

Group boundaries
|     ||       |   |
 0 0 0  2 2 2 2 3 3

Claim: The location of the j^th group boundary (1 ≤ j ≤ k) is uniquely determined by the segment x_j.

Proof: It's just after the ((j - 1) k + b_j)^th memory location, and x_j uniquely determines b_j. //

We say that the algorithm has observed the j^th group boundary in case the results of its probes of x_j uniquely determine x_j. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.

Group boundaries
|   X   |   |     |
 0 0 ? 1 2 2 3 3 3
|     |     |     |
Segment boundaries

Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".

Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.

Proof: Every other consecutive pair bounds only even groups. //

Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.

Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.

Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the j^th group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to x_j determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //

Example:

Observed group boundaries; relevant subsequence marked by […]
[             ]   |
 0 0 Y 1 1 Z 2 3 3
|     |     |     |
Segment boundaries

Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1

As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.

At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of x_j, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the j^th boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.

In this way, the adversary forces the algorithm to observe at least log₂ (k + 1) group boundaries, and in observing the j^th group boundary, the algorithm is forced to make at least log₂ (k + 1) probes.

Extensions:

This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.

It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).

Your argument applies to the class of algorithms that sequentially perform binary searches in order to find boundaries. There might be other approaches, so writing "every deterministic algorithm" is a bit of a strech. Your argument is valuable and quite clever and addresses the approach that is probably on everyone's mind. I'd only drop the "every deterministic algorithm" part. — Bolo, Jul 05 '10 at 22:24
No, it's a general lower bound because the structure of the input limits the amount of information that can be gathered by one query. — user382751, Jul 05 '10 at 23:04
You're right, this *is* a general argument and seems convincing. (The only doubtful part to me was whether the adversary could make one side odd despite previous probes, but yes, it can.) Great answer! — ShreevatsaR, Jul 05 '10 at 23:09
I am not completely convinced. Why does every algorithm _have_ to find boundaries? Why do we even need to have information about boundaries at all? Even assuming that every algorithm has to determine boundaries somehow, how can we ensure the Omega(logn) probe time for the subsequent boundaries? — , Jul 06 '10 at 00:43
About the omega(log n) lower bound: Say the algorithm looks at 0. Steps to the right, finds a 3, steps to the left and finds a 2, steps more to the left and finds a 1, then a 0 etc. Now we could use the information about the 1, 2 and 3 in subsequent boundary finding, can't we? How have you ensured that this information does not affect the Omega(logn) probe time? Looking for the 0-1 boundary has not told you where the 1-2 boundary is, but it does help you narrow it down. So it is not exactly independent... — , Jul 06 '10 at 02:11
That's true in general, but for the set of inputs used by this lower bound, _a priori_, the sets of locations for two different boundaries may touch but not overlap, preventing that sort of inference. — user382751, Jul 06 '10 at 02:16
Not 100% convinced, but I think you're on the right track. This kind of analysis of the problem is worth a lot more than the source code with wrong big-O estimates lots of other ppl keep posting... — R.. GitHub STOP HELPING ICE, Jul 06 '10 at 02:20
I don't see how you can avoid overlaps. What if the algorithm jumped more than sqrt(n) distance for some probe? Anyway all I am saying is that the rightmost 1 and left most 2 is what the next boundary is narrowed down to. How do you ensure that it is Omega(n^c)? With multiple boundaries, you have multiple such ranges. Anyway, sorry for so many questions... — , Jul 06 '10 at 03:04
Same issue with the potential function: Case 1) At best halved the possible locations of a boundary. We could affect possibly more than one boundaries and their possible locations. Case 2) At some point when we pin down a boundary, the previous set of possibilities for that boundary need not be k+1 and so the offsetting increase term need not be log(k+1). Sorry, but I am not convinced, but it is probably my inability to understand what you are doing exactly. You have spent a lot of effort trying to explain to me already, I won't bother you anymore. — , Jul 06 '10 at 13:48
I think your answer is much to pessimistic because of the complexity / cost model that you have. As the answer given by Rotsor points out things change when you go to more realistic models. In particular I think that any useful lower bound should also involve his parameter `K`. — Jens Gustedt, Jul 08 '10 at 11:44
Since this is not a "real" problem, I see no reason why a "realistic" model should have a parameter K. Nevertheless, with N elements and at most K copies of any one element, this argument readily generalizes to yield a lower bound of Ω(log N log K). — user382751, Jul 08 '10 at 15:01
Great, I wish I could upvote this again! See also Greg Kuperberg's rewrite of this answer below (and the older version of this answer). — ShreevatsaR, Jul 12 '10 at 18:14
@throwawayacct thx for Your effort in rewriting Your argument. Only that made me able to understand it. — Dave O., Jul 12 '10 at 21:27
@user382751 - consider this. Given a range R s.t. |R| is even, we can compare its endpoints in constant time. If they're the same, we can eliminate R from consideration. If not, then it's lg(R) operations (binary search) to find the boundary in R. If we choose R to be in the center of the part of the array we're searching, this eliminates (roughly) half the search space in lg(R) time. Our two cases are that either we eliminate R indices in constant time or n/2 in lg(R) time. Solve R*lg(R) = n/2 to find the best value for R. This takes lg(R)*lg(N). — Dave, Oct 03 '14 at 02:39
@user382751 In simple english this means the task cannot be done in O(log N) time complexity, isn't it? — SadeepDarshana, Sep 30 '17 at 08:53

Nabb · Answer 2 · 2010-07-05T18:53:56.537

15

A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).

PROOF OF O((log N)²)

  T(2) = 1 //to make the summation nice
  T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements

For some N=2^k,

T(2^k) = (log 2^k) + T(2^(k-1))
       = (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
       = (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
       = k + (k-1) + (k-2) + ... + 1
       = k(k+1)/2
       = (k² + k)/2
       = (log(N)² + log(N))/ 2
       = O(log(N)²)

edited Jul 05 '10 at 18:53

answered Jul 05 '10 at 16:53

Nabb

3,434
3
22
32

1

Nitpick: an O(log N) algorithm is also an O(log² N) algorithm. The Ω is more interesting here. – Matthieu M. Jul 06 '10 at 07:13
1

@Matthieu... Nitpick the nitpick: The Theta is even more interesting :-) – Jul 06 '10 at 13:57
1

@Moron: Yes I guess so :p However the current discussion seemed to evolve into proving it was impossible to do it in less than log2 N. I still don't see how it can be proved though, sorting algorithms proved that given some constraints you could get better asymptotic performances that the common models suggested, so I wonder if it could be applied here, somehow. – Matthieu M. Jul 07 '10 at 06:22
On the other hand, while an input with 0s, 1s and 2s is tailor-made for an O(n) counting sort, binary search is still worst-case optimal at Θ(log n) queries. – user382751 Jul 07 '10 at 22:16
This is a nice answer, but the question asks for an O(log n) solution and this does not achieve it — only O((log n)^2), which many others also got. So why is this the highest voted answer? :p – ShreevatsaR Jul 09 '10 at 07:35
@ShreevatsaR: I agree, why is the less optimal algorithm getting the upvotes here? – Dean J Jul 12 '10 at 17:21
Well, this is still a good answer, with a concise explanation of the best known algorithm and a complete analysis. Given the proof contained in the answer that is now thankfully highest-voted, this algorithm is optimal, and deserves upvotes. :-) – ShreevatsaR Jul 12 '10 at 18:13

score 11 · Answer 3 · answered Jul 05 '10 at 17:04

11

Look at the middle element of the array. With a couple of appropriate binary searches, you can find the first and its last appearance in the array. E.g., if the middle element is 'a', you need to find i and j as shown below:

[* * * * a a a a * * *]
         ^     ^ 
         |     |
         |     |
         i     j

Is j - i an even number? You are done! Otherwise (and this is the key here), the question to ask is i an even or an odd number? Do you see what this piece of knowledge implies? Then the rest is easy.

answered Jul 05 '10 at 17:04

Dimitris Andreou

8,806
2
33
36

Ah, I see Jerry gave it away! :) – Dimitris Andreou Jul 05 '10 at 17:05
2

If j-i is even, then you have to repeat your algorithm with half the array. The *i*th iteration takes (log N) - i steps so you have O ( (log N) ^ 2 ) steps. – Pete Kirkham Jul 05 '10 at 17:32
Not really. At the ith iteration, the two binary searches only need log(N / 2^i) - at each step, the array portion against which I have to do the two binary searches at least halves. – Dimitris Andreou Jul 05 '10 at 18:20
1

@Dimitris - you can almost half your number of binary searches by only searching for one bound on each iteration. Half the time you'll find the odd count on the larger side, but you don't need to worry about unbalanced bisection. To get that, you need a large number of repeats of the same value, which is handled pretty quickly when it spans the centre of the current range, so it's not a worst case. – Jul 05 '10 at 18:30
1

@Dimitris log(N / 2^i) = log(N) - i, so it **is** O(log(n)^2) – Rotsor Jul 05 '10 at 18:44
@Steve314, also right. Though I don't see this affecting the big oh analysis. Still wondering what the O(logn) solution would be (without assumptions on the input). – Dimitris Andreou Jul 05 '10 at 19:45

Greg Kuperberg · Answer 4 · 2010-07-20T01:38:18.977

This answer is in support of the answer posted by "throwawayacct". He deserves the bounty. I spent some time on this question and I'm totally convinced that his proof is correct that you need Ω(log(n)^2) queries to find the number that occurs an odd number of times. I'm convinced because I ended up recreating the exact same argument after only skimming his solution.

In the solution, an adversary creates an input to make life hard for the algorithm, but also simple for a human analyzer. The input consists of k pages that each have k entries. The total number of entries is n = k^2, and it is important that O(log(k)) = O(log(n)) and Ω(log(k)) = Ω(log(n)). To make the input, the adversary makes a string of length k of the form 00...011...1, with the transition in an arbitrary position. Then each symbol in the string is expanded into a page of length k of the form aa...abb...b, where on the ith page, a=i and b=i+1. The transition on each page is also in an arbitrary position, except that the parity agrees with the symbol that the page was expanded from.

It is important to understand the "adversary method" of analyzing an algorithm's worst case. The adversary answers queries about the algorithm's input, without committing to future answers. The answers have to be consistent, and the game is over when the adversary has been pinned down enough for the algorithm to reach a conclusion.

With that background, here are some observations:

1) If you want to learn the parity of a transition in a page by making queries in that page, you have to learn the exact position of the transition and you need Ω(log(k)) queries. Any collection of queries restricts the transition point to an interval, and any interval of length more than 1 has both parities. The most efficient search for the transition in that page is a binary search.

2) The most subtle and most important point: There are two ways to determine the parity of a transition inside a specific page. You can either make enough queries in that page to find the transition, or you can infer the parity if you find the same parity in both an earlier and a later page. There is no escape from this either-or. Any set of queries restricts the transition point in each page to some interval. The only restriction on parities comes from intervals of length 1. Otherwise the transition points are free to wiggle to have any consistent parities.

3) In the adversary method, there are no lucky strikes. For instance, suppose that your first query in some page is toward one end instead of in the middle. Since the adversary hasn't committed to an answer, he's free to put the transition on the long side.

4) The end result is that you are forced to directly probe the parities in Ω(log(k)) pages, and the work for each of these subproblems is also Ω(log(k)).

5) Things are not much better with random choices than with adversarial choices. The math is more complicated, because now you can get partial statistical information, rather than a strict yes you know a parity or no you don't know it. But it makes little difference. For instance, you can give each page length k^2, so that with high probability, the first log(k) queries in each page tell you almost nothing about the parity in that page. The adversary can make random choices at the beginning and it still works.

D'oh. Looks like we both had the idea of rewriting this argument :P — user382751, Jul 12 '10 at 16:48
So it goes. I didn't mean my answer only be a rewrite, but also as an annotation or summary. Maybe it's still useful for that purpose. — Greg Kuperberg, Jul 12 '10 at 17:10
very useful indeed. i was only able to follow throwawayacct's proof after his rewrite and even then it was very difficult. Your explanations have a great clarity - wish my math prof had a similar skill. thx for Your effort. kudos. — Dave O., Jul 12 '10 at 21:22
@Dave: You're very welcome and, seriously, praise like yours helps keep me going. — Greg Kuperberg, Jul 12 '10 at 21:52
@GregKuperberg What do you think of this approach? Suppose we proceed by repeatedly doing the following: 1) define R to be a range of elements of even length in the center of the remaining search space. 2) Check the end points of R. 3) If they are the same then remove R from the search space; If not then there is a boundary in R, which we can find via binary search on R in log R time. I.e., we either reduce the search space by R in O(1) or by ~ half in O(R). — Dave, Oct 04 '14 at 02:07
@DaveGalvin - One way or another it can't help in the worst case, because user382751 proved that it is Ω(log(n)^2) and it is easy to see that it is O((log N)^2). Probably R is in the worst case too short to help you much. — Greg Kuperberg, Oct 13 '14 at 02:52

score 5 · Answer 5 · answered Jul 05 '10 at 17:02

5

Start at the middle of the array and walk backward until you get to a value that's different from the one at the center. Check whether the number above that boundary is at an odd or even index. If it's odd, then the number occurring an odd number of times is to the left, so repeat your search between the beginning and the boundary you found. If it's even, then the number occurring an odd number of times must be later in the array, so repeat the search in the right half.

As stated, this has both a logarithmic and a linear component. If you want to keep the whole thing logarithmic, instead of just walking backward through the array to a different value, you want to use a binary search instead. Unless you expect many repetitions of the same numbers, the binary search may not be worthwhile though.

answered Jul 05 '10 at 17:02

Jerry Coffin

476,176
80
629
1,111

1

Isn't this `O((log n)^2)` since you are repeating the binary search `log n` times in the worst case? – IVlad Jul 05 '10 at 17:19
@IVlad: not really -- after each search, you discard (at least) half the array, so in each subsequent search, the amount you're searching also decreases logarithmically. – Jerry Coffin Jul 05 '10 at 17:37
3

`(log n)+(log n/2)+(log n/4)+... = log(n^(log n)*2^(-(log n)((log n)-1)/2)) = (log n)²-(log n)((log n)-1)/2 = (log n)²/2+(log n)/2 = O((log n)²)`. For WolframAlpha: "sum log_2(2^k/2^i) for i=0..k-1". – Nabb Jul 05 '10 at 17:50
1

yes, but in order to discard that half you have to run another binary search, which makes it log square. I think posting some pseudocode will make this clearer. – IVlad Jul 05 '10 at 17:51

Rotsor · Answer 6 · 2010-07-08T10:20:52.493

I have an algorithm which works in log(N/C)*log(K), where K is the length of maximum same-value range, and C is the length of range being searched for.

The main difference of this algorithm from most posted before is that it takes advantage of the case where all same-value ranges are short. It finds boundaries not by binary-searching the entire array, but by first quickly finding a rough estimate by jumping back by 1, 2, 4, 8, ... (log(K) iterations) steps, and then binary-searching the resulting range (log(K) again).

The algorithm is as follows (written in C#):

// Finds the start of the range of equal numbers containing the index "index", 
// which is assumed to be inside the array
// 
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
    int candidate = index;
    int value = arr[index];
    int step = 1;

    // find the boundary for binary search:
    while(candidate>=0 && arr[candidate] == value)
    {
        candidate -= step;
        step *= 2;
    }

    // binary search:
    int a = Math.Max(0,candidate);
    int b = candidate+step/2;
    while(a+1!=b)
    {
        int c = (a+b)/2;
        if(arr[c] == value)
            b = c;
        else
            a = c;
    }
    return b;
}

// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
    if(arr[start] == arr[end-1])
        return end;

    int middle = (start+end)/2;

    int rangeStart = findRangeStart(arr,middle);

    if((rangeStart & 1) == 0)
        return search(arr, middle, end);
    return search(arr, start, rangeStart);
}

// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
    return search(arr, 0, arr.Length);
}

I didn't read your code entirely, but +1 for you seem the first one that observe that `K` has to be part of the complexity of the problem. Maybe you could add a textual description of your ideas? — Jens Gustedt, Jul 08 '10 at 09:08

score 2 · Answer 7 · answered Jul 05 '10 at 22:37

Take the middle element e. Use binary search to find the first and last occurrence. O(log(n)) If it is odd return e. Otherwise, recurse onto the side that has an odd number of elements [....]eeee[....]

Runtime will be log(n) + log(n/2) + log(n/4).... = O(log(n)^2).

Charles Bretana · Answer 8 · 2010-07-05T23:17:45.233

1

AHhh. There is an answer.

Do a binary search and as you search, for each value, move backwards until you find the first entry with that same value. If its index is even, it is before the oddball, so move to the right.
If its array index is odd, it is after the oddball, so move to the left.

In pseudocode (this is the general idea, not tested...):

    private static int FindOddBall(int[] ary)
    {
        int l = 0,
            r = ary.Length - 1;
        int n = (l+r)/2;
        while (r > l+2)
        {
            n = (l + r) / 2;
            while (ary[n] == ary[n-1])
                n = FindBreakIndex(ary, l, n);
            if (n % 2 == 0) // even index we are on or to the left of the oddball 
                l = n;
            else            // odd index we are to the right of the oddball
                r = n-1;
        }
        return ary[l];
    }
    private static int FindBreakIndex(int[] ary, int l, int n)
    {
        var t = ary[n];
        var r = n;
        while(ary[n] != t || ary[n] == ary[n-1])
            if(ary[n] == t)
            {
                r = n;
                n = (l + r)/2;
            }
            else
            {
                l = n;
                n = (l + r)/2;
            }
        return n;
    }

edited Jul 05 '10 at 23:17

answered Jul 05 '10 at 17:02

Charles Bretana

143,358
22
150
216

I don't think so, as this algorithm is based on a binary search... doubling the size of the array would not double the number of iterations, it would only increase it by 1... As I understand it, that's O(n) – Charles Bretana Jul 05 '10 at 17:30
2

@Charles Breatana - doubling the size of the array **would** double the number of iterations of your algorithm. Consider an array of `n - 1` equal numbers and 1 other number. Your innermost while loop is `O(n)` because it will iterate over `n / 2` elements. – IVlad Jul 05 '10 at 17:33
Yr example is a special case, which can be corrected by changing the inner while loop to do the same thing. (see EDIT). In general, Each outer iteration eliminates half the array, the half on the 'wrong' side of the element being examined, so no, it would not... – Charles Bretana Jul 05 '10 at 17:38
3

@Charles Bretana - special case or not, it makes your current algorithm `O(n)`. I don't see any edit, but if we're thinking about the same thing it will still not be `O(log n)` but `O((log n)^2)`. – IVlad Jul 05 '10 at 17:49
2

Why the cryptic variable names? – Svante Jul 05 '10 at 18:27
@IVlad, You may be right about `O((log n) ^2)` In general, nested binary iterations would give you `O((log n) ^2)`, as you said, but this scenario, I think. may be different, because here they are not totally independant nested iterations. Neither the inner nor the outer iteration must iterate over every element, only those elements not eliminated by the other iteration... This will definitely reduce the number of processing steps required and may mitigate the "O-ness" of the algorithm (although I must confess that I'm not sure exactly how to calculate that.) – Charles Bretana Jul 05 '10 at 23:26
@Charles - they actually are totally independent, why do you say they're not? I'm not sure what you mean by "iterate over every element" - they're binary searches, of course they don't iterate over every element. The problem is that you have an inner search that binary searches a range that was binary searched by your outer search (weird sentence, I know). I'm sure the algorithm is very fast in practice, but strictly speaking it **is** `O((log n)^2)` because the recurrence is `T(n) = log n + T(n / 2)` which is easily shown to be log square. – IVlad Jul 06 '10 at 00:01
@IVLad, I'll defer to yr judgment, as I'm not sure myself how to calculate the O=ness... what I meant, however, is perhaps best illustrated by yr example (array of n - 1 equal numbers and 1 other number) in this case my outer iteration would only process once, (maybe twice) and the inner iteration would handle the rest of the work.. whereas for n/2 pairs of different numbers and one single number, the inner search would process only once per pair, and the pouter iteration would handle the bulk of the work... – Charles Bretana Jul 06 '10 at 00:42
@VLad, How to measure this ? Run it with array of 1000 elements, 10,000 elements, and then with 100,000, and compare the results? – Charles Bretana Jul 06 '10 at 00:44
@Charles - I think it's going to be hard to determine if it's log square or log, especially if you have a fast processor. You could try comparing your algorithm with a binary search for a million or even 10 million elements, preferably on a slower computer. That might tell us something. Make sure to run them multiple times and take the average. – IVlad Jul 06 '10 at 08:13
@VLad, I will try that... Do you see what I mean? As the algorithm searches, each portion, segment, (or "piece") of the array is processed and eliminated either by the outer iteration or by the inner one, but never both... – Charles Bretana Jul 06 '10 at 13:11
@Charles - I see, but I'm not sure I agree :). It would be nice if others shared their thoughts. – IVlad Jul 06 '10 at 14:17
@VLad, i couldn't ask for more... frankly, till i test this, i'm not sure either... but appreciate that you understand the point... – Charles Bretana Jul 06 '10 at 15:26

score 1 · Answer 9 · edited May 24 '12 at 22:27

1

You can use this algorithm:

int GetSpecialOne(int[] array, int length)
{
   int specialOne = array[0];

   for(int i=1; i < length; i++)
   {
      specialOne ^= array[i];
   }
   return specialOne;
}

Solved with the help of a similar question which can be found here on http://www.technicalinterviewquestions.net

edited May 24 '12 at 22:27

ServAce85

1,602
2
23
51

answered Jun 21 '11 at 15:08

pramod

11
1

score 0 · Answer 10 · answered Jul 15 '14 at 17:39

Use a hash table

For each element E in the input set
    if E is set in the hash table
         increment it's value
    else        
         set E in the hash table and initialize it to 0

For each key K in hash table
    if K % 2 = 1
        return K

As this algorithm is 2n it belongs to O(n)

score 0 · Answer 11 · answered Jul 05 '10 at 21:36

We don't have any information about the distribution of lenghts inside the array, and of the array as a whole, right?

So the arraylength might be 1, 11, 101, 1001 or something, 1 at least with no upper bound, and must contain at least 1 type of elements ('number') up to (length-1)/2 + 1 elements, for total sizes of 1, 11, 101: 1, 1 to 6, 1 to 51 elements and so on.

Shall we assume every possible size of equal probability? This would lead to a middle length of subarrays of size/4, wouldn't it?

An array of size 5 could be divided into 1, 2 or 3 sublists.

What seems to be obvious is not that obvious, if we go into details.

An array of size 5 can be 'divided' into one sublist in just one way, with arguable right to call it 'dividing'. It's just a list of 5 elements (aaaaa). To avoid confusion let's assume the elements inside the list to be ordered characters, not numbers (a,b,c, ...).

Divided into two sublist, they might be (1, 4), (2, 3), (3, 2), (4, 1). (abbbb, aabbb, aaabb, aaaab).

Now let's look back at the claim made before: Shall the 'division' (5) be assumed the same probability as those 4 divisions into 2 sublists? Or shall we mix them together, and assume every partition as evenly probable, (1/5)?

Or can we calculate the solution without knowing the probability of the length of the sublists?

From my point of view the problem description is very clear: An array of length n containing sorted numbers. Each number occurs an even number of times, except one, which occurs an odd number of times. The except one implies that n >= 1, there are no further implications! (a), (a,a,a), (a,...(2*i times a)...,a,b) etc are all valid inputs. — Dave O., Jul 07 '10 at 19:31
No problem so far, but to decide, which algorithm is best to find said number we have to make assumptions about typical arrays, how often which case occurs, which isn't trivial for an unlimited number of possible problems. Maybe you can show an algorithm which produces all arrays of a length of 5 or all arrays up to a length of 5, or with another restriction, but the way you decide how to produce those arrays will have influence on the probability. — user unknown, Jul 09 '10 at 02:33

score 0 · Answer 12 · answered Jul 12 '10 at 17:20

The clue is you're looking for log(n). That's less than n.

Stepping through the entire array, one at a time? That's n. That's not going to work.

We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.

So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.

Keep going from there.

What's if there happens no change in value at the probe? – Dave O. Jul 12 '10 at 20:46 — Dave O., Jul 12 '10 at 20:46

score 0 · Answer 13 · answered Oct 16 '15 at 15:43

Try this:

int getOddOccurrence(int ar[], int ar_size)
{
     int i;
     int xor = 0; 
     for (i=0; i < ar_size; i++)     
        xor = xor ^ ar[i];

     return res;
}

XOR will cancel out everytime you XOR with the same number so 1^1=0 but 1^1^1=1 so every pair should cancel out leaving the odd number out.

David Lehavi · Answer 14 · 2010-07-06T13:14:23.717

-3

Assume indexing start at 0. Binary search for the smallest even i such that x[i] != x[i+1]; your answer is x[i].

edit: due to public demand, here is the code

int f(int *x, int min, int max) {
  int size = max;
  min /= 2;
  max /= 2;
  while (min < max) {
    int i = (min + max)/2;
    if (i==0 || x[2*i-1] == x[2*i])
      min = i+1;
    else
      max = i-1;
  }
  if (2*max == size || x[2*max] != x[2*max+1])
    return x[2*max];
  return x[2*min];
}

edited Jul 06 '10 at 13:14

answered Jul 06 '10 at 11:18

David Lehavi

1,186
7
16

How can one search for the "smallest even" with binary search? Imagine You pick a even probe near the middle of the array. Imagine further that x[i] == x[i+1]. How do You decide whether to proceed left or right of i? – Dave O. Jul 06 '10 at 11:24
@Dave: thanks for taking the point off w/o waiting for an answer. You now have i = 2j and x[i] == x[i+1], on the next iteration take j = j / 2, i = 2j. – David Lehavi Jul 06 '10 at 11:32
@David Lehavi let be x := {a, a, b, b, c}. First probe: i == 2 => j == 1. x[i] == x[i+1] == b (the first b). Second iteration: new_j == 0, new_i == 0. That's where I'm confused. Also, in my example there is no even i for which x[i] != x[i+1] – Dave O. Jul 06 '10 at 11:49
@Dave I eddited my answer to answer your question. – David Lehavi Jul 06 '10 at 12:05
I don't get the second iteration. "first" and "last" are the minimum and maximum index, so 0 and 4 in the beginning - right? first = first / 2 = 0 / 2 = 0, last = last / 2 + 1 = 4 / 2 + 1 = 3 (!= 2 as in Your answer). Further, please note that in a={0,0,1} there is no even i for which a[i] != a[i+1] - even if Your algorithm was correct, Your wording wouldn't be. Can You please write a function "int oddNumber(int *array, int minIndex, int maxIndex)" (the indexes are inclusive)? Thx in advance. – Dave O. Jul 06 '10 at 12:24
`int nums [] = {1, 1, 1, 2, 2, 2, 2, 3, 3}; cout << f(nums, 0, 8);` returns `2` instead of `1`. I assumed 0-based indexing and that `max` is the highest valid index. – IVlad Jul 06 '10 at 13:04
@Dave, lVlad - oops, mixed the +1, -1 – David Lehavi Jul 06 '10 at 13:14
@David - still doesn't work on my example. Now it returns `3`. – IVlad Jul 06 '10 at 13:19
1

@David Lehavi Thank You for Your effort. Since Your solution is valid C code, You can copy-paste it into Your favorite IDE, compile it and see it fail like I did (a: {1,2,2}). Dear David, I do not mean to bash You at all! I'm very interested in the solution (so much that I'll start a bounty). And when I heared that someone came up with a solution, I wanted to make sure it was right and therefore intensively looked for errors. Maybe You did an error in reasoning or maybe You are on the right track.. wish You luck winning the bounty. Greetings. – Dave O. Jul 06 '10 at 15:30

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

14 Answers14

Linked