In less-than-linear time, find the duplicate in a sorted array

Question

Today, an interviewer asked me this question. My immediate response was that we could simply do a linear search, comparing the current element with the previous element in the array. He then asked me how the problem could be solved in less-than-linear time.

Assumptions

The array is sorted
There is only one duplicate
The array is only populated with numbers [0, n], where n is the length of the array.

Example array: [0,1,2,3,4,5,6,7,8,8,9]

I attempted to come up with a divide-and-conquer algorithm to solve this, but I'm not confident that it was the right answer. Does anyone have any ideas?

your example includes only numbers `[0,n-2]` (there are no `10`, `11`) Is it just an example or a general rule? — jfs, Mar 17 '12 at 18:33

score 25 · Accepted Answer · answered Mar 12 '12 at 19:42

25

Can be done in O(log N) with a modified binary search:

Start in the middle of the array: If array[idx] < idx the duplicate is to the left, otherwise to the right. Rinse and repeat.

answered Mar 12 '12 at 19:42

BrokenGlass

158,293
28
286
335

It should be `array[idx]-array[0]` for generic version. – user Jul 10 '13 at 15:50

Daniel Fischer · Answer 2 · 2012-03-12T19:55:17.773

7

If no number is missing from the array, as in the example, it's doable in O(log n) with a binary search. If a[i] < i, the duplicate is before i, otherwise it's after i.

If there is one number absent and one duplicate, we still know that if a[i] < i the duplicate must be before i and if a[i] > i, the absent number must be before i and the duplicate after. However, if a[i] == i, we don't know if missing number and duplicate are both before i or both after i. I don't see a way for a sublinear algorithm in that case.

edited Mar 12 '12 at 19:55

answered Mar 12 '12 at 19:42

Daniel Fischer

181,706
17
308
431

I'm very late, but if you allow missing numbers, it's indeed impossible (assuming you can't read an arbitrarily large number of cells in O(1)). Suppose we consider entries of size n+1 (n>=2) and we limit ourselves to this subset of entries : {[0,0,2,…,n], [0,1,1,3,…,n], ..., [0,1,…,k,k,k+2,…,n], ..., [0,1,…,n-1,n-1]}. Let's assume you already know the contents of any of up to (n-2) cells and that they were pairwise distinct, there's still at least 2 possibilities and you can't discriminate any. Therefore you need to read at least (n-1) cells to decide which number is the duplicate. – Caninonos Jan 16 '18 at 15:39

aioobe · Answer 3 · 2012-03-12T19:55:21.427

I attempted to come up with a divide-and-conquer algorithm to solve this, but I'm not confident that it was the right answer.

Sure, you could do a binary search.

If arr[i/2] >= i/2 then the duplicate is located in the upper half of the array, otherwise it is located in the lower half.

while (lower != upper)
    mid = (lower + upper) / 2
    if (arr[mid] >= mid)
        lower = mid
    else
        upper = mid-1

Since the array between lower and upper is halved in each iteration, the algorithm runs in O(log n).

ideone.com demo in Java

score 1 · Answer 4 · answered May 26 '15 at 13:23

1

Difference between sum of given array elements and sum of 0 to n-1 natural numbers gives you the duplicated element. Sum of 0 to n-1 elements is (N * N-1)/2 example array is [0,1,2,3,4,5,6,7,8,8,9] sum of 0 to 9 natural numbers is : 45 sum of given array elements : 53 53-45 = 8 Which is the duplicated element

answered May 26 '15 at 13:23

chenna

159
1
4
14

adding up all elements is O(n) - and therefore over budget – tucuxi Jan 23 '17 at 20:50

score 1 · Answer 5 · answered Oct 09 '18 at 16:44

#include <bits/stdc++.h>
using namespace std;

int find_only_repeating_element(int arr[] , int n){
int low = 0;
int high = n-1;
while(low <= high){
    int mid = low + (high - low)/2;
    if(arr[mid] == arr[mid + 1] || arr[mid] == arr[mid - 1]){
        return arr[mid];
    }
    if(arr[mid] < mid + 1){
        high = mid - 2;
    }else{
        low = mid + 1;
    }
   }
   return -1;
}

int main(int argc, char const *argv[])
{
int n , *arr;
cin >> n;
arr = new int[n];
for(int i = 0 ; i < n ; i++){
    cin >> arr[i];
}
    cout << find_only_repeating_element(arr , n) << endl;
    return 0;
}

Welcome to SO. Please consider adding some explanation to your code when answering a question. — f-CJ, Oct 09 '18 at 17:02

score 0 · Answer 6 · answered May 26 '14 at 19:34

How about that? (recursion style)

public static int DuplicateBinaryFind(int[] arr, int left, int right)
{
   int dup =0;

   if(left==right)
   {
      dup = left;
   }
   else
   {
        int middle = (left+right)\2;
        if(arr[middle]<middle)
        {
          dup = DuplicateBinaryFind(arr,left, middle-1);

        }
        else
        {
           dup = DuplicateBinaryFind(arr, middle+1, right);
        }
   }

   return dup;

}

score -1 · Answer 7 · edited May 23 '17 at 10:32

The example array is a little bit different from your question. Since n is the length of array and there are one and only duplicate in array, the value of each element in array should be in [0,n-1].

If that is true, then this question is the same one with How to find a duplicate element in an array of shuffled consecutive integers?

The following code should find the duplicate in O(n) time and O(1) space.

public static int findOnlyDuplicateFromArray(int[] a, boolean startWithZero){
    int xor = 0;
    int offset = 1;
    for(int i=0; i < a.length; i++){
        if(startWithZero)
            xor = xor ^ (a[i] + offset) ^ i;
        else
            xor = xor ^ a[i] ^ i;
        }
        if(startWithZero)
            xor = xor - offset;
    return xor;
}

Sorry, this is not a less linear time. Should use binary search to achieve the goal. — user1106083, Mar 17 '12 at 06:54

In less-than-linear time, find the duplicate in a sorted array

7 Answers7

Linked