128

Input: Given an array of n elements which contains elements from 0 to n-1, with any of these numbers appearing any number of times.

Goal : To find these repeating numbers in O(n) and using only constant memory space.

For example, let n be 7 and array be {1, 2, 3, 1, 3, 0, 6}, the answer should be 1 & 3. I checked similar questions here but the answers used some data structures like HashSet etc.

Any efficient algorithm for the same?

Zaki
  • 1,305
  • 3
  • 9
  • 3
  • Method using [in place modification](https://leetcode.com/problems/find-all-duplicates-in-an-array/discuss/241917/Python-wo-output-space) (unlike existing answers which use O(n) space for output) – tejasvi88 Jan 24 '22 at 14:39

13 Answers13

173

This is what I came up with, which doesn't require the additional sign bit:

for i := 0 to n - 1
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end while
end for

for i := 0 to n - 1
    if A[i] != i then 
        print A[i]
    end if
end for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that it may not look O(n) at first blush, but it is - although it has a nested loop, it still runs in O(N) time. A swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

The second loop prints the values of x for which A[x] doesn't equal x - since the first loop guarantees that if x exists at least once in the array, one of those instances will be at A[x], this means that it prints those values of x which are not present in the array.

(Ideone link so you can play with it)

caf
  • 233,326
  • 40
  • 323
  • 462
  • 1
    This is pretty brilliant. Did you just come up with that yourself? – Andrew Rasmussen Apr 21 '11 at 05:19
  • 14
    @arasmussen: Yeah. I came up with a broken version first, though. The contraints of the problem give a bit of a clue to the solution - the fact that every valid array value is also a valid array index hints at `a[a[i]]`, and the O(1) space constraint hints at the `swap()` operation being key. – caf Apr 21 '11 at 05:26
  • 2
    @caf : Please run your code with the array as {3,4,5,3,4} it fails. – NirmalGeo Apr 21 '11 at 05:54
  • 8
    @NirmalGeo: That is not a valid input, because `5` is not in the range `0..N-1` (`N` in this case being `5`). – caf Apr 21 '11 at 05:55
  • 2
    @caf the output for {1,2,3,1,3,0,0,0,0,6} is 3 1 0 0 0 or in any case where repetition is more than 2. Is it correct o/p ? – Terminal Apr 21 '11 at 06:42
  • @Terminal: I believe so, it was certainly my intention anyway. – caf Apr 21 '11 at 06:56
  • @Terminal: I've just checked, and it's possible to modify the algorithm to only print each duplicated number once, whilst maintaining the `O(N)` time / `O(1)` space properties. However I feel that it is less elegant (the details are left as an exercise for the reader ;). – caf Apr 21 '11 at 07:04
  • 3
    This is amazing! I've seen a number of variants on this question, usually more constrained, and this is the most general way to solve it that I've seen. I'll simply mention that changing the `print` statement to `print i` turns this into a solution to http://stackoverflow.com/questions/5249985/finding-missing-elements-in-an-array and (assuming the "bag" is a modifiable array) Qk of http://stackoverflow.com/questions/3492302/easy-interview-question-got-harder-given-numbers-1-100-find-the-missing-number. – j_random_hacker Apr 22 '11 at 00:38
  • @Terminal, @caf: I have taken up your challenge of printing dupes once only; see my answer. BTW I think it's pretty elegant myself :-P – j_random_hacker Apr 22 '11 at 02:53
  • 1
    I note that, with an initial random array A, the number the inner while loop is run is always very close to `0.63 * n`, does anyone have an explanation? – rafak Apr 23 '11 at 12:17
  • I think after swap you should do i++. – Saeed Amiri Nov 21 '11 at 07:20
  • @SaeedAmiri: No, for some inputs that loop needs to run several times with the same `i`. Note that after the swap, `A[A[i]]` is necessarily accessing a different array location from the previous iteration. – caf Nov 21 '11 at 07:32
  • Is it possible to find the missing number also? – parsh Jan 05 '13 at 20:10
  • Sorry, i found the solution, need to replace A[i] with i while printing. Anyways was wondering will the algorithm change or perform bad, in case size is pretty large, say 10^6, with the numbers 0 through (10^6)-1. – parsh Jan 05 '13 at 20:59
  • I don't get it. How is the while loop not an infinite one if its condition matches? – Nikos C. Sep 19 '13 at 08:58
  • @NikosC.: Because if the swap occurs, then the next time the condition is evaluated `A[i]` will have changed, so `A[A[i]]` is not the same array location that it was last time. – caf Sep 19 '13 at 09:08
  • This solution does not look like O(n) to me. Even on average, it is O(n^2) (please see @rafak's comment) – user5054 Jan 14 '18 at 07:35
  • 1
    @user5054: It is O(n), the explanation for this is in the answer. rafak is not claiming that the inner loop runs ~0.63n iterations for every iteration of the outer loop, but that it runs ~0.63n iterations (of the body, ie the swap) *in total*. – caf Jan 15 '18 at 03:25
  • @caf I found better solution for *Finding duplicates in O(n) time and O(1) space*. You can use this link https://www.geeksforgeeks.org/duplicates-array-using-o1-extra-space-set-2/ – CrazyPro007 Jan 15 '19 at 14:45
  • 1
    @CrazyPro007: The problem with that solution is that it requires storing numbers larger than n-1 in the array elements - that's basically hidden extra space, approximately O(n log n). – caf Jan 16 '19 at 04:49
  • 1
    Consider the case where the array elements are 16-bit unsigned integers and n is 65536. – caf Jan 16 '19 at 22:39
  • This is devilishly clever. It took me an embarrassing amount of time to realise why this works. Great algorithm! – Elliott Sep 17 '20 at 10:46
38

caf's brilliant answer prints each number that appears k times in the array k-1 times. That's useful behaviour, but the question arguably calls for each duplicate to be printed once only, and he alludes to the possibility of doing this without blowing the linear time/constant space bounds. This can be done by replacing his second loop with the following pseudocode:

for (i = 0; i < N; ++i) {
    if (A[i] != i && A[A[i]] == A[i]) {
        print A[i];
        A[A[i]] = i;
    }
}

This exploits the property that after the first loop runs, if any value m appears more than once, then one of those appearances is guaranteed to be in the correct position, namely A[m]. If we are careful we can use that "home" location to store information about whether any duplicates have been printed yet or not.

In caf's version, as we went through the array, A[i] != i implied that A[i] is a duplicate. In my version, I rely on a slightly different invariant: that A[i] != i && A[A[i]] == A[i] implies that A[i] is a duplicate that we haven't seen before. (If you drop the "that we haven't seen before" part, the rest can be seen to be implied by the truth of caf's invariant, and the guarantee that all duplicates have some copy in a home location.) This property holds at the outset (after caf's 1st loop finishes) and I show below that it's maintained after each step.

As we go through the array, success on the A[i] != i part of the test implies that A[i] could be a duplicate that hasn't been seen before. If we haven't seen it before, then we expect A[i]'s home location to point to itself -- that's what's tested for by the second half of the if condition. If that's the case, we print it and alter the home location to point back to this first found duplicate, creating a 2-step "cycle".

To see that this operation doesn't alter our invariant, suppose m = A[i] for a particular position i satisfying A[i] != i && A[A[i]] == A[i]. It's obvious that the change we make (A[A[i]] = i) will work to prevent other non-home occurrences of m from being output as duplicates by causing the 2nd half of their if conditions to fail, but will it work when i arrives at the home location, m? Yes it will, because now, even though at this new i we find that the 1st half of the if condition, A[i] != i, is true, the 2nd half tests whether the location it points to is a home location and finds that it isn't. In this situation we no longer know whether m or A[m] was the duplicate value, but we know that either way, it has already been reported, because these 2-cycles are guaranteed not to appear in the result of caf's 1st loop. (Note that if m != A[m] then exactly one of m and A[m] occurs more than once, and the other does not occur at all.)

Community
  • 1
  • 1
j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • 2
    Yes, that's very similar to that which I came up with. It's interesting how an identical first loop is useful for several different problems, just with a different printing loop. – caf Apr 22 '11 at 04:17
25

Here is the pseudocode

for i <- 0 to n-1:
   if (A[abs(A[i])]) >= 0 :
       (A[abs(A[i])]) = -(A[abs(A[i])])
   else
      print i
end for

Sample code in C++

Jordan Borisov
  • 1,603
  • 6
  • 34
  • 69
Prasoon Saurav
  • 91,295
  • 49
  • 239
  • 345
  • 3
    Very clever - encoding the answer in the sign bit of the indexed entry! – holtavolt Apr 21 '11 at 03:14
  • 3
    @sashang : It can't be. Check out the problem specification. "Given an array of n elements **which contains elements from 0 to n-1**" – Prasoon Saurav Apr 21 '11 at 03:15
  • 5
    This won't detect duplicate 0s, and will spot the same number as being a duplicate multiple times. – Null Set Apr 21 '11 at 03:15
  • 1
    @Null Set: You can just replace `-` with `~` for the zero issue. – user541686 Apr 21 '11 at 03:16
  • 29
    This may be the answer that the problem is driving at, but technically it uses `O(n)` hidden space - the `n` sign bits. If the array is defined such that each element can only hold values between `0` and `n-1`, then it obviously does not work. – caf Apr 21 '11 at 03:21
  • ...and the algorithm you've presented here doesn't work, anway - for the OPs example it prints `3, 4` and leaves the array as `{ -1, -2, -3, 3, 0, -6 }`. – caf Apr 21 '11 at 03:32
  • If input is `{1, 0, 3, 1, 3, 0, 6}` above code and the C++ code prints `3, 0` and duplicate leaving 1. – anubhava Apr 21 '11 at 03:39
  • Also, the problem description needs to specify or at least imply that the data type of the array is signed integers (thus the extra bit is available). It usually does, but the phrasing of this particular question does not - "array of n elements" should say "array of n integers" or "array of n ints". – Terry Mahaffey Apr 21 '11 at 03:46
  • @caf: his code works when I test it on the OPs input, and the changes to the array are reversible with a second iteration over the array; O(2N) is still O(N). However, the sign bits are a fair complaint. If your input array holds 8-bit integers, and N > 127, you're gonna run into issues. – Dennis Zickefoose Apr 21 '11 at 03:48
  • @Dennis: The C++ code and the code in the answer are not the same - one prints `i` and the other prints `abs(A[i])`. – caf Apr 21 '11 at 03:51
  • This is not a solution; see caf's comment. – R.. GitHub STOP HELPING ICE Apr 21 '11 at 03:52
  • @caf: So it is; either way it is still mucking up zeros, so it needs work. – Dennis Zickefoose Apr 21 '11 at 03:58
  • You know, this algo is actually faster than the one caf provided since he's doing a multiply by -1 rather than a swap. If he tweaked it so that the zero case wasn't there (not hard to do), this could be faster than the swap algo. – Andrew Rasmussen Apr 21 '11 at 05:36
  • 1
    In this approach, the problem of redundant outputs could be solved by stealing another bit to serve as a "has been printed" flag. For example, with an array of 8-bit unsigned integers, require that n < 2^6, and reserve `A[i] & 1<<7` as the "is a duplicate" flag (like the minus sign in the answer) and reserve `A[i] & 1<<6` as a "has been printed" flag. – gcbenison Jan 26 '12 at 22:13
2

For relatively small N we can use div/mod operations

n.times do |i|
  e = a[i]%n
  a[e] += n
end

n.times do |i| 
  count = a[i]/n
  puts i if count > 1
end

Not C/C++ but anyway

http://ideone.com/GRZPI

hoha
  • 4,418
  • 17
  • 15
  • +1 Nice solution. Stopping adding *n* to an entry after two times will accommodate larger *n*. – Apshir Dec 21 '11 at 06:56
1

Let's assume that we present this array as a uni-directional graph data structure - each number is a vertex and its index in the array points to another vertex forming an edge of the graph.

For even more simplicity we have indices 0 to n-1 and range of number from 0..n-1. e.g.

   0  1  2  3  4 
 a[3, 2, 4, 3, 1]

0(3) --> 3(3) is a cycle.

Answer: Just traverse the array relying on indices. if a[x] = a[y] then it's a cycle and thus duplicate. Skip to the next index and continue again and so forth until the end of of an array. Complexity: O(n) time and O(1) space.

Ivan Voroshilin
  • 5,233
  • 3
  • 32
  • 61
  • Hmm. I can't see the nice link between cycles and duplicates. Consider `array = [1, 0]`: element s 0 and 1 cycle, but aren't duplicates. What you could deduce, is that if you use this traversal method and reach a cycle, that the last element *before* the cycle is a duplicate, eg: `array = [1, 2, 3, 4, 2]`. This creates a few new problems. How would you detect a cycle without using extra memory and time. – Elliott Oct 09 '20 at 11:23
  • Secondly, even if you could detect when you've cycled back in constant time and space, what about arrays like this: `array = [1, 2, ...., n - 1, 0, 0]` (the single duplicate of the `0` value). Going through the cycles for each element would take `O(n)` time and so all-up it would be `O(n^2)` time. – Elliott Oct 09 '20 at 11:27
  • @Elliott I believe this is "Floyd's cycle detection algorithm", it has been proven to take O(n) time for finding a duplicate. – Zenquiorra May 17 '21 at 08:58
  • @Zenquiorra, I think my example above is proof enough that this doesn't work. Also, Ivan here wasn't describing Floyd's method, which uses two speeds of traversal. Besides, Floyd *could* be adjusted here to determine whether a duplicate exists or not (in `O(n)` time and `O(1)` space), but it wouldn't help provide a solution. – Elliott May 17 '21 at 09:35
  • @Elliott Aren't they using two speeds of traversals when they mention? `a[x] = a[y]` where x and y are two indices (two different speeds)? – Zenquiorra May 17 '21 at 10:44
  • And when you say : " Floyd could be adjusted here to determine whether a duplicate exists or not (in O(n) time and O(1) space), but it wouldn't help provide a solution." Why won't it provide a solution? The entry point of the cycle would be the solution, right? – Zenquiorra May 17 '21 at 10:53
  • @Zenquiorra, For Floyd it's crucial that one pointer (or number, in our case) traverses twice while the other traverses exactly once. Ivan's algorithm isn't that precise, but there's nothing like that mentioned at all. As for your second point, maybe think through some examples? Firstly, a cycle doesn't imply that there's any duplicates, eg. [1,2,0,3]. Secondly, how would you know which are the "entry points" of the cycle when using constant memory? – Elliott May 17 '21 at 13:59
  • @Elliott It is possible to find entry points in O(1) space. When both the pointers meet, set one of them back to index 0, while the other stays right where they met, now move both the pointers one step at a time at the same speed, where they meet `again` would be the entry point. There is proof to it on why it works. Also, it is indeed true that cycle doesn't guarantee a duplicate, but as I said "The entry points of the cycle would be the solution", there won't be any entry point in the example you have mentioned which is what we will get (i.e. they will never meet `again`.) – Zenquiorra May 17 '21 at 15:03
  • Although I think you are right for the case when there are multiple duplicates, this approach would not work. But if there is a variation to it, then I am not sure how we would implement that. – Zenquiorra May 17 '21 at 15:52
  • @Zenquiorra, Yeah, that actually works for 1 duplicate. It took me paper and pen to realise why that method works for finding the entry point. I stand corrected on my second comment! =P – Elliott May 17 '21 at 17:06
1

Not really pretty but at least it's easy to see the O(N) and O(1) properties. Basically we scan the array and, for each number we see if the corresponding position has been flagged already-seen-once (N) or already-seen-multiple-times (N+1). If it is flagged already-seen-once, we print it and flag it already-seen-multiple-times. If it is not flagged, we flag it already-seen-once and we move the original value of the corresponding index to the current position (flagging is a destructive operation).

for (i=0; i<a.length; i++) {
  value = a[i];
  if (value >= N)
    continue;
  if (a[value] == N)  {
    a[value] = N+1; 
    print value;
  } else if (a[value] < N) {
    if (value > i)
      a[i--] = a[value];
    a[value] = N;
  }
}

or, better yet (faster, despite the double loop):

for (i=0; i<a.length; i++) {
  value = a[i];
  while (value < N) {
    if (a[value] == N)  {
      a[value] = N+1; 
      print value;
      value = N;
    } else if (a[value] < N) {
      newvalue = value > i ? a[value] : N;
      a[value] = N;
      value = newvalue;
    }
  }
}
CAFxX
  • 28,060
  • 6
  • 41
  • 66
  • 1
    +1, it works nicely, but it took a bit of thought to figure out exactly why `if (value > i) a[i--] = a[value];` works: if `value <= i` then we have already processed the value at `a[value]` and can overwrite it safely. Also I wouldn't say the O(N) nature is obvious! Spelling it out: The main loop runs `N` times, plus however many times the `a[i--] = a[value];` line runs. That line can only run if `a[value] < N`, and each time it runs, immediately afterwards an array value that was not already `N` is set to `N`, so it can run at most `N` times, for a total of at most `2N` loop iterations. – j_random_hacker Jul 14 '12 at 17:23
0

Check out the explanation here https://youtu.be/qJ_Y7pKP0e4

code here https://github.com/TechieExpress/DataStructures/blob/main/findDuplicates

Code snippet:

/**
*
* @author techieExpress
*
* You are given a list of n-1 integers and these integers are in the range * of 1 to n.
* Input: Given an array of n elements which contains elements 
* from 0 to n-1, with any of these numbers appearing any number of times.
* 
* Goal: To find these repeating numbers in O(n) and using only constant  * * memory space.
**/

public class findDuplicates {
    
    
    public static void main(String args[])
    {
        int arr[] = { 2,1,1,2 };
  
        for (int i = 0; i < arr.length; i++) {
            arr[arr[i] % arr.length]
                = arr[arr[i] % arr.length]
                  + arr.length;
        }
        System.out.println("The repeating elements are : ");
        for (int i = 0; i < arr.length; i++) {
            
            //System.out.print(numRay[i]);
            if (arr[i] >= arr.length * 2) {
                System.out.println(i + " ");
                arr[i]=arr[i]%arr.length;
            }
        }
    }

}
-1

One solution in C is:

#include <stdio.h>

int finddup(int *arr,int len)
{
    int i;
    printf("Duplicate Elements ::");
    for(i = 0; i < len; i++)
    {
        if(arr[abs(arr[i])] > 0)
          arr[abs(arr[i])] = -arr[abs(arr[i])];
        else if(arr[abs(arr[i])] == 0)
        {
             arr[abs(arr[i])] = - len ;
        }
        else
          printf("%d ", abs(arr[i]));
    }

}
int main()
{   
    int arr1[]={0,1,1,2,2,0,2,0,0,5};
    finddup(arr1,sizeof(arr1)/sizeof(arr1[0]));
    return 0;
}

It is O(n) time and O(1) space complexity.

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
Anshul garg
  • 233
  • 2
  • 6
  • 2
    The space complexity of this is O(N), because it uses N additional sign bits. The algorithm should work under the assumption that the array element type can *only* hold numbers from 0 to N-1. – caf Sep 12 '12 at 10:55
  • yes that true but for asked algo its perfect as they wanted the algo for numbers 0 to n-1 only and also i checked your solution its going above O(n) so i thought of this – Anshul garg Sep 12 '12 at 11:02
-1

We can do it O(n) time and O(1) space complexity by -

  1. take the ith array element.

  2. Make it +ve if it's negative

  3. Last, multiply with -1 to the number getting from array index (ith element).

  4. If the number positive, return the index.

     def findDuplicate(self, arr: List[int]) -> int:
         n=len(arr)
         for i in range(0,n):
    
             arr[(abs(arr[i]))-1]=arr[(abs(arr[i]))-1]*(-1)
             if arr[(abs(arr[i]))-1]>0:
                 return abs(arr[i])
    
-2
static void findrepeat()
{
    int[] arr = new int[7] {0,2,1,0,0,4,4};

    for (int i = 0; i < arr.Length; i++)
    {
        if (i != arr[i])
        {
            if (arr[i] == arr[arr[i]])
            {
                Console.WriteLine(arr[i] + "!!!");
            }

            int t = arr[i];
            arr[i] = arr[arr[i]];
            arr[t] = t;
        }
    }

    for (int j = 0; j < arr.Length; j++)
    {
        Console.Write(arr[j] + " ");
    }
    Console.WriteLine();

    for (int j = 0; j < arr.Length; j++)
    {
        if (j == arr[j])
        {
            arr[j] = 1;
        }
        else
        {
            arr[arr[j]]++;
            arr[j] = 0;
        }
    }

    for (int j = 0; j < arr.Length; j++)
    {
        Console.Write(arr[j] + " ");
    }
    Console.WriteLine();
}
Eli
  • 7
  • 2
-2
private static void printRepeating(int arr[], int size) {
        int i = 0;
        int j = 1;
        while (i < (size - 1)) {
            if (arr[i] == arr[j]) {
                System.out.println(arr[i] + " repeated at index " + j);
                j = size;
            }
            j++;
            if (j >= (size - 1)) {
                i++;
                j = i + 1;
            }
        }

    }
  • The above solution will achieve the same in time complexity of O(n) and constant space. – user12704811 Jan 13 '20 at 14:17
  • 3
    Thank you for this code snippet, which might provide some limited short-term help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its long-term value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – Toby Speight Jan 13 '20 at 17:21
  • 4
    BTW, the time complexity seems to be O(n²) here - hiding the inner loop doesn't change that. – Toby Speight Jan 13 '20 at 17:23
-2

Algorithm can be readily seen in the following C function. Retrieving original array, although not required, will be possible taking each entry modulo n.

void print_repeats(unsigned a[], unsigned n)
{
    unsigned i, _2n = 2*n;
    for(i = 0; i < n; ++i) if(a[a[i] % n] < _2n) a[a[i] % n] += n;
    for(i = 0; i < n; ++i) if(a[i] >= _2n) printf("%u ", i);
    putchar('\n');
}

Ideone Link for testing.

Apshir
  • 193
  • 8
  • 1
    I'm afraid this is technically "cheating", since working with numbers up to 2*n requires an extra 1 bit of storage space per array entry over what is required to store the original numbers. In fact you need closer to log2(3) = 1.58 extra bits per entry, because you're storing numbers up to 3*n-1. – j_random_hacker Jul 14 '12 at 17:31
-3

If the array is not too large this solution is simpler, It creates another array of same size for ticking.

1 Create a bitmap/array of same size as your input array

 int check_list[SIZE_OF_INPUT];
 for(n elements in checklist)
     check_list[i]=0;    //initialize to zero

2 scan your input array and increment its count in the above array

for(i=0;i<n;i++) // every element in input array
{
  check_list[a[i]]++; //increment its count  
}  

3 Now scan the check_list array and print the duplicate either once or as many times they have been duplicated

for(i=0;i<n;i++)
{

    if(check_list[i]>1) // appeared as duplicate
    {
        printf(" ",i);  
    }
}

Of course it takes twice the space consumed by solution given above ,but time efficiency is O(2n) which is basically O(n).

Deepthought
  • 2,815
  • 2
  • 28
  • 39
  • oops ...! didnt notice that ... my bad . – Deepthought Jul 07 '12 at 14:02
  • @nikhil how is it O(1)?. My array check_list grows linearly as the size of input grows,so how is it O(1) if so what are the heuristics you are using to call it O(1). – Deepthought Jul 08 '12 at 15:09
  • For a given input you need constant space, isn't that O(1)? I could well be wrong :) – nikhil Jul 08 '12 at 17:40
  • My solution needs more space as the input grows. The efficiency (space/time) of an algorithm in not measured for a particular input.(In such case time efficiency of every searching algorithm would be constant i.e element found in the 1st index where we searched).Its measured for any input, thats the reason why we have best case ,worst case and average case. – Deepthought Jul 08 '12 at 18:11