9

Does anyone know a faster-than-linear algorithm for finding a duplicate in a sequential list of numbers? I'm working in Java now but any language or psuedo-code is fine.

For example, given this int[] input:

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 7 | 8 | 9

Output would be either index or value '7'.

I know the obvious traversal at O(n) linear time, but I'm trying to see if this is possible via binary search at O(log n) time.

Himanshu
  • 31,810
  • 31
  • 111
  • 133
Nick Jonas
  • 1,197
  • 4
  • 13
  • 24
  • What's the obvious linear time one? Are you assuming the list is sorted? – sshannin Sep 11 '12 at 14:51
  • You only have one duplicate value? – Nejc Sep 11 '12 at 14:55
  • Sounds like an interview question ;) – Peter Lawrey Sep 11 '12 at 14:57
  • If the list is sequential, it's sorted. The obvious method is traversing through the list and comparing list[n] to list[n+1], which takes at worst O(n) comparisons. – Nick Jonas Sep 11 '12 at 14:58
  • Wasn't sure what you meant by sequential...If you mean increasing by 1, see Peter's answer (just look for when you see a number at a higher index than you would expect). – sshannin Sep 11 '12 at 14:59
  • Using Binary search to find the duplicate in sequential array will help you solve the problem within o(log n) time complexity. The key is to look for the anomaly in the sequence => i.e., pick the middle number see if it is the expected value else the required duplicate should have happened in the right half of the array. Following article has a sample implementation considering a fact that the sequence may start from any number (negative or 1 or any positive number) http://davidsekar.com/algorithms/finding-duplicate-number-in-a-sequence – David Chelliah Jun 05 '16 at 10:04

3 Answers3

14

If you assume the numbers must start at 0 and be increasing by 1 you can compare the middle to the index. If the middle is the same go higher, if the middle is not, go lower.

This will give you binary search time O(log2 N). The only difference is that you are comparing with the index, rather than a fixed value.


public static void main(String... args) {
    int[] array = {0, 1, 2, 3, 4, 5, 6, 7, 7, 8, 9};
    int duplicate = findDuplicate(array);
    System.out.println(duplicate);
}

private static int findDuplicate(int[] array) {
    int low = 0;
    int high = array.length - 1;

    while (low <= high) {
        int mid = (low + high) >>> 1;
        int midVal = array[mid];

        if (midVal == mid)
            low = mid + 1;
        else
            high = mid - 1;
    }
    return high;
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • If by "start at 0 and be increasing" you mean sorted : P This still fails with gaps though: 0 | 3 | 14 – sshannin Sep 11 '12 at 14:55
  • 2
    @sshannin I meant to write "increasing by 1" :P – Peter Lawrey Sep 11 '12 at 14:56
  • The midpoint should always be proportional to the list length if it's a sequential list with NO duplicates, right? So on the nth iteration, the midpoint should equal (length - (length / (n + 1))? When first halving my list example, on the first iteration, the midpoint should be exactly 5 but it's 5.5 rounded down to 5 - this is only because there is an extra duplicate in the second half, showing that we need to look in the second half. I just don't know how this would look in code, and if I'm overthinking this? – Nick Jonas Sep 11 '12 at 15:06
  • I suggest you try writing some of the code based on the code for a binary search e.g. Arrays.binarySearch ;) – Peter Lawrey Sep 11 '12 at 15:09
  • what do you mean if the middle is the same as the index? what's the index? – Nick Jonas Sep 11 '12 at 15:10
  • I know the standard binary search algorithm, but having trouble writing this one =P – Nick Jonas Sep 11 '12 at 15:12
  • Ok, have a look at my solution. ;) – Peter Lawrey Sep 11 '12 at 15:16
1

Notice that binary search is meant to work on sorted lists. So if you have a sorted list with duplicates, binary search will only be useful if your duplicates are adjacent. The importance of being adjacent is so that you can test the presence of the key at the previous and next position of the found key. Any other way of trying to use binary search on unsorted lists will give incorrect results.

Here is a bit of code to show what I mean.

import java.util.Arrays;
public class Main {
    public static void main(String[] args) {
        int[] list = {1, 2, 3, 4, 5, 6, 7, 7, 8, 9 };
        int key = 7;
        int result = Arrays.binarySearch(list, key);
        System.out.println(result);
        if( list[result+1] == key  || list[result-1] == key )
                System.out.println("yes we have a duplicate.");
    }
}

The comparison in the if being O(1) we have remain with the O(logn) of binary search.

nt.bas
  • 736
  • 1
  • 6
  • 13
  • But the poster said that his list is sequential/sorted...so duplicates will be next to each other. – jeff Sep 11 '12 at 15:19
  • "( Let's relax a bit the definition of list here because mathematical lists don't allow duplicates. )" What? – argentage Sep 11 '12 at 16:12
  • @airza : too bad! I was thinking about properties of lists but writing those of sets. Actually lists allow duplicates. Corrected. Thanks for pointing that out. – nt.bas Sep 12 '12 at 07:02
1
public class DuplicateNumber {

    public int findDuplicateNumber(List<Integer> numbers){

        int highestNumber = numbers.size() - 1;
        int total = getSum(numbers);
        int duplicate = total - (highestNumber*(highestNumber+1)/2);
        return duplicate;
    }

    public int getSum(List<Integer> numbers){

        int sum = 0;
        for(int num:numbers){
            sum += num;
        }
        return sum;
    }

    public static void main(String a[]){
        List<Integer> numbers = new ArrayList<Integer>();
        for(int i=1;i<30;i++){
            numbers.add(i);
        }
        //add duplicate number into the list
        numbers.add(22);
        DuplicateNumber dn = new DuplicateNumber();
        System.out.println("Duplicate Number: "+dn.findDuplicateNumber(numbers));
    }
}
kleopatra
  • 51,061
  • 28
  • 99
  • 211
rajesh
  • 11
  • 1