3

I had an interview today which involved this very question and in order to widen my algorithmic knowledge. I am trying to see if there are any better suggestions.

I was trying to find duplicates in an array without using java.util and widen my algorithmical knowledge in regards to addressing space and time complexities.

Below is the code I produced during the technical assessment:

 public static boolean isThereDuplicates(int[] A){
    
    for (int i = 0; i < A.length; i++)
        for (int j = i + 1; j < A.length; j++){
            if (A[i] == A[j])
                return true;
        }
   
            return false;
}

This simple algorithm looks identical to the Bubble Sort, which runs in O(N^2). Is there any other better algorithms that I could use to achieve this?

peterh
  • 11,875
  • 18
  • 85
  • 108
L-Samuels
  • 2,712
  • 9
  • 34
  • 42
  • possible duplicate of [Find the repeated element](http://stackoverflow.com/questions/7059780/find-the-repeated-element) – templatetypedef Sep 13 '11 at 21:39
  • 2
    First of all do we know anything more about array A? And are we bound by space complexity? I see from your other comment that we can have only integers from 1 to N, which allows us to use James' method (given that we have enough space for the count array). If we didn't know anything about A's elements then e still could sort the array and see if there are any neighbours that are identical (even during sorting) which would give us O(nlogn). Basically that's what you did but you chose a very inefficient sorting method :) – Mateusz Dymczyk Sep 13 '11 at 21:50
  • yeah thanks Zen I knew the code i was writing was inefficient even as i was jotting it but the test was very difficult and I was heavily bounded by time constraints and algorithmic inexperience. – L-Samuels Sep 13 '11 at 21:57
  • will it be an array of integers or an array of whatever? Special tricks apply to integers – harold Sep 14 '11 at 08:23
  • its an array of ints @harold. – L-Samuels Sep 14 '11 at 12:01

3 Answers3

4

If the values of A are reasonably bounded (i.e. you have enough RAM) you could use the bones of the radix-sort algorithm to find a duplicate in O(n).

public static boolean containsDuplicates(int[] A)
{
    // Create a zero-initialised array the size of the maximum allowed value in A.
    int[] count = new int[maximumValuePossible(A)];

    for (int i = 0; i < A.length; i++)
    {
        if (count[A[i]] != 0)
        {
            // The value at A[i] is already in the histogram -> duplicate!
            return true;
        }

        // A[i] is not in the histogram yet.
        count[A[i]]++;
    }

    return false;
}

Edit: To return a copy of the array with duplicates removed you could then do:

public static int[] stripped(int[] A)
{
    int[] count = new int[maximumValuePossible(A)];
    int uniques = 0;

    for (int i = 0; i < A.length; i++)
    {
        count[A[i]]++;
        if (count[A[i]] == 1)
        {
            uniques++;
        }
    }

    if (uniques == 0) return null;

    int[] retArray = new int[uniques];
    int retIndex = 0;
    for (int i = 0; i < count.length; i++)
    {
        if (count[i] > 0)
        {
            retArray[retIndex++] = count[i];
        }
    }

    return retArray;
}
James
  • 9,064
  • 3
  • 31
  • 49
  • If i remember correctly the array A of length N could only contain integers from 1 to N. – L-Samuels Sep 13 '11 at 21:37
  • 1
    Well then its easy, you just need an index occurrence count array. – RBarryYoung Sep 13 '11 at 21:42
  • Yes thanks James for the very extensive answer. if this sort of question comes up next time i'll be more prepared. – L-Samuels Sep 13 '11 at 21:59
  • @james. sorry ive just been looking over the code and due my infancy in programming i have to break it down semantically in order to gain full understanding. In the contains duplicate method you instantiate a count array to the maximumValuePossible. What exactly is meant by maxValPossible? if i had an array {1,1,2,2,3,4,5}. Is it just the largest value in the array, in this case 5 or am i missing something really trivial. Is it also possible for you to add some comments to the containsDuplicate method? This would be really helpful for me. I will vote up as gratitude. Thanks – L-Samuels Sep 15 '11 at 22:42
  • 1
    @Lawrence88: It's the maximum possible value in the array, so in your example above it would be 5 (and for your interview question it would be N). Essentially you construct a histogram of the array values: [0:0],[1:2],[2:2],[3:1],[4:1],[5:1]. If there was no upper bound (i.e. the full 32-bits range could be used) you'd have to allocate a 4GB array to hold the histogram which would make the technique unfeasible on most machines. – James Sep 16 '11 at 14:51
  • Thanks a lot geezer. Vote Up! :D – L-Samuels Sep 16 '11 at 23:28
3

The SOP solution to this is via Hashing. Which is O(n), and in deference to james, is sort of the bones of the radix-sort algorithim (or maybe just the marrow).

RBarryYoung
  • 55,398
  • 14
  • 96
  • 137
  • +1 Using a HashSet here would still be O(N) without the large memory overhead. Also the simplest solution at that ;) – Voo Sep 13 '11 at 22:09
  • Voo: for the general problem it is the simplest, but if A[n] is really restricted to 1..n as Lawrence implied in one of the comments, then I think that an occurrence or count array is even simpler. – RBarryYoung Sep 13 '11 at 22:36
  • I think it's pretty equal and isn't limited to any specific N. `if (map.put(arr[i],0) != NULL) return false;` is all we need if we want to find if there are any duplicates or a second put (storing the old value + 1) if we want the count. – Voo Sep 13 '11 at 23:28
  • @Voo the question was without using java.util. – L-Samuels Sep 14 '11 at 10:02
  • @Lawrence88 Ah sorry missed that. Ok in that it'd certainly be more work. – Voo Sep 14 '11 at 14:41
1

You could also sort the array with any O(nlogn) sorting algorithm then do a linear scan of the sorted array to see if element i and i+1 are equal. Total run time will be O(nlogn). The space complexity would depend on the sorting algorithm used.

Jay
  • 9,314
  • 7
  • 33
  • 40