Puzzle : finding out repeated element in an Array

Question

Size of an array is n.All elements in the array are distinct in the range of [0 , n-1] except two elements.Find out repeated element without using extra temporary array with constant time complexity.

I tried with o(n) like this.

   a[]={1,0,0,2,3};
    b[]={-1,-1,-1,-1,-1};
    i=0;
    int required;
    while(i<n)
    {
      b[a[i]]++;
      if(b[a[i]==1)
        required=a[i];
    }
    print required;

If there is no constraint on range of numbers i.e allowing out of range also.Is it possible get o(n) solution without temporary array.

Is this homework? This isn't really the kind of question for SO. (Unless it's homework.) — JoshD, Oct 08 '10 at 02:27
This can be done in `O(1)` time? Are you sure you're not messing it up with linear time? The solution I can think of still requires you to go through n elements. — wkl, Oct 08 '10 at 02:29
Similar to [ Algorithm to find two repeated numbers in an array, without sorting ](http://stackoverflow.com/questions/555744). — Matthew Flaschen, Oct 08 '10 at 02:31
You cannot solve this problem in constant-time. Since you don't know where the elements are in advance, and no ordering is stipulated, you require at least O(n) time. — Marcelo Cantos, Oct 08 '10 at 02:31
@Michael Petrotta:I tried by taking temporary array of size n-1.I want to reduce the space complexity.. — Jagan, Oct 08 '10 at 02:33
@Jagan: ok, that's a start. Please post the code you have so far. — Michael Petrotta, Oct 08 '10 at 02:34
http://stackoverflow.com/questions/2750393/jquery-ajax-call-succeeds-but-returns-nothing-jsonp — Hogan, Oct 08 '10 at 02:46
possible duplicate of [Algorithm to find two repeated numbers in an array, without sorting](http://stackoverflow.com/questions/555744/algorithm-to-find-two-repeated-numbers-in-an-array-without-sorting) — Hogan, Oct 08 '10 at 02:49
Use the original array as a bit mask --- flop the items to neg as you go over the list when you see one you have already changed flip them back and return it. Simple — Hogan, Oct 08 '10 at 02:55
@Hogan, that halfway works, as long as you assume the input is writable and that it's a large enough type to have an extra bit available in each slot. What if the array was bytes and `n=255`? You might say you can arrange for this not to be the case, but then what you're doing is actually giving yourself `O(n)` temporary space. So it's cheating. — R.. GitHub STOP HELPING ICE, Oct 08 '10 at 02:59
@R : I agree it is cheating, but why is a professor giving this kind of question anyway? -- it is not like it will ever help in applied CS. — Hogan, Oct 08 '10 at 03:02
I believe it was an interview question, not homework. I don't think it has much value in that context since I wouldn't expect a good programmer to necessarily come up with the answer on the spot, but ability to understand complexity theory and develop algorithms with optimal big-O (not just in time but also in space) has a lot of practical value. If nothing else, experience writing algorithms that do not need temporary space (and thus which have no failure cases) is quite valuable in real-world development. — R.. GitHub STOP HELPING ICE, Oct 08 '10 at 03:06
@R : Yes understanding complexity and able to write algorithms is useful. So are questions that actually come up. For example, one I've seen variations a million times -- db conversion with different codes for state -- how do you convert? Most of the time it is coded wrong. — Hogan, Oct 08 '10 at 03:17

MSN · Answer 1 · 2010-10-08T03:49:19.077

3

XOR all the elements together, then XOR the result with XOR([0..n-1]).

This gives you missing XOR repeat; since missing!=repeat, at least one bit is set in missing XOR repeat.

Pick one of those set bits. Iterate over all the elements again, and only XOR elements with that bit set. Then iterate from 1 to n-1 and XOR those numbers that have that bit set.

Now, the value is either the repeated value or the missing value. Scan the elements for that value. If you find it, it's the repeated element. Otherwise, it's the missing value so XOR it with missing XOR repeat.

edited Oct 08 '10 at 03:49

answered Oct 08 '10 at 03:28

MSN

53,214
7
75
105

Another `O(n log n)` approach, but completely different from mine. :-) – R.. GitHub STOP HELPING ICE Oct 08 '10 at 03:57
@R.. Are you sure this isn't linear complexity? – Mark B Oct 08 '10 at 04:15
Actually I think you're right. I read it as an iterative approach where you peel off a bit at a time, but indeed there are only 2 passes over the array and 2 passes over the values `1` to `n-1`. So I think this is the answer! – R.. GitHub STOP HELPING ICE Oct 08 '10 at 04:27
2

The last pass can be eliminated. When you are making the first pass, calculate the sum `S` (per Vlad) together with your XOR. The sign of the `S - SUM(0...n-1)` will tell you whether the missing is greater or smaller than the repeated. When you are making the second pass, you can also independently XOR the numbers that has that bit *not set* (and the same for numbers in `0...n-1` range). That way you will determine the *second* number as well. I.e. now you know both missing and repeated. Since you know whether the repeated is greater or smaller, you can tell which one is repeated right away. – AnT stands with Russia Oct 08 '10 at 05:00
1

@AndreyT: I like your first suggestion, but the second suggestion seems like a waste of time in the loop. It's easier just to do a single XOR operation at the end to get the other value. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 05:12
@R..: Yes, you are right. Since we know `missing XOR repeat` and we also know one of them (either `missing` or `repeat`), then a simple XOR will give us the other. No need for a XOR in the loop. – AnT stands with Russia Oct 08 '10 at 05:53

score 2 · Accepted Answer · 2010-10-08T02:42:12.857

2

Look what is first and last number
Calculate SUM(1) of array elements without duplicate (like you know that sum of 1...5 = 1+2+3+4+5 = 15. Call it SUM(1)). As AaronMcSmooth pointed out, the formula is Sum(1, n) = (n+1)n/2.
Calculate SUM(2) of the elements in array that is given to you.
Subtract SUM(2) - SUM(1). Whoa! The result is the duplicate number (like if a given array is 1, 2, 3, 4, 5, 3, the SUM(2) will be 18. 18 - 15 = 3. So 3 is a duplicate).

Good luck coding!

edited Oct 08 '10 at 02:42

answered Oct 08 '10 at 02:34

1

This isn't O(1), as it requires you to iterate over not one, but two arrays. Also, having an array without a duplicate counts as having a temporary array. – Rafe Kettler Oct 08 '10 at 02:36
1

@rafe Kettler, you can use the identity Sum(1, n) = (n+1)n/2 for step 2 – aaronasterling Oct 08 '10 at 02:37
I hope you'd optimize step 2... ;-) You should also note that the sum should be unsigned, and it doesn't matter if it wraps modulo `UINT_MAX+1`. Otherwise, the sum of `n` numbers if `O(n)` in space. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 02:39
@Aaron: I'm pretty sure OP was incorrect about `O(1)` time and really wanted `O(1)` space. Naively, this sum is `O(n)` in space, but it would be just as good to known the sum modulo `M` where `M>=n` since the difference modulo `M` will be sufficient to determine which number was duplicate. Thankfully, a convenient value for `M` is `UINT_MAX+1`. :-) – R.. GitHub STOP HELPING ICE Oct 08 '10 at 02:42
How do you check for dups in step 2 without a temp array? – sje397 Oct 08 '10 at 02:45
@R, that's a good trick. I don't see why it's needed though. Why is accumulating the sum while looping over the array O(n) in space? It seems like it would only require one `unsigned int` and an index for any size array (up until `UINT_MAX+1` would not suffice either. Is there a way to sum that's more naive than that? – aaronasterling Oct 08 '10 at 02:46
2

@Rafe: that is not a temporary array. Summing the numbers 1 through `n-1` does not require first putting them in an array, even if you forget the formula to do it. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 02:46
Incorrect even ignoring the complexity. The array has n elements with n-1 out of n different values. This means that not only is one of the values from 0 to n-1 duplicated, but a different value in that range is *missing*. But the problem can still be solved with O(1) space (not counting the input) and O(n) time. – aschepler Oct 08 '10 at 02:47
@Aaron: It's not quite as bad as I stated, especially since the integers being summed are in the range `0` to `n`. But in general, The sum of `n` fixed-size integers is `O(log n)` (not `O(n)` as I previously wrote) in space. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 02:49
@Vlad Lazarenko:consider the when element 5 is repeated.Does your logic works ? – Jagan Oct 08 '10 at 02:50
@aschepler: You might be onto something. I read the problem as the values being taken from the range 1 through `n-1`, but indeed it says 0 through `n-1`. It seems like there should be a missing element in addition to a duplicated element. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 02:51
@Jagan: Yes. `1+2+3+4+5=15. 1+2+3+4+5+5=20. 20-15=5. 5 is a duplicate.` – Oct 08 '10 at 02:52
2

@Vlad: If `n=6`, then the values are taken from the range 0 to 5. What if another value had been omitted instead of the 0 being omitted? – R.. GitHub STOP HELPING ICE Oct 08 '10 at 02:54
@Vlad Lazarenko:1+2+3+4+5=15. 1+1+2+3+4=11. 15-11=4. Is 4 duplicate ? – Jagan Oct 08 '10 at 02:56
@Jagan: You are breaking 'distinct range' requirement here. – Oct 08 '10 at 03:02
@R.. Good point. 0 can mess up stuff here. Don't know how to deal with that yet. Gotta work tomorrow so I go to bed. – Oct 08 '10 at 03:02
@Jagan: With Vlad's algorithm, S1=1+2+3+4=10 and S2=1+1+2+3+4=11. S2-S1=1. However it only works because the omitted element was 0. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 03:09
@R.. OK, the final idea is that if ARRAY[ABS(SUM2-SUM1)] == 0, then answer is 0, otherwise ABS(SUM2-SUM1). Now I can sleep :) – Oct 08 '10 at 03:11
In general, S2-S1 is equal to the duplicated element minus the missing element. I don't see how knowing this difference helps you quickly find the value of the duplicated element, however. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 03:11
1

@Vlad Lazarenko: No. The approach you suggested does not work, regardless of how you spin it. One value `A` is missing, another value `B` is duplicated (basically, substituted instead of `A`). That will increase the sum by `B - A`. That's what you find by your algorithm: the `B - A` value. However, knowing `B - A` is still not enough to determine neither `B` nor `A`. Knowing that the sum has increased by 2 would mean that 5 was replaced by 7 (7 is duplicated), or that 1 was replaced by 3 (3 is duplicated) or lots of other possibilities. – AnT stands with Russia Oct 08 '10 at 03:33
(By "increase" I mead "change": the value of `B - A` can be negative, of course). – AnT stands with Russia Oct 08 '10 at 03:46
See my answer. It's based on Vlad's, but fixing up the problem that you only know the difference of the missing and duplicate numbers and not their values. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 03:57
This just gives X-Y , you can check my answer which uses another condition for the calculation. – Igoy May 29 '12 at 09:34

sje397 · Answer 3 · 2010-10-08T02:53:47.160

2

Pick two distinct random indexes. If the array values at those indexes are the same, return true.

This operates in constant time. As a bonus, you get the right answer with probability 2/n * 1/(n-1).

edited Oct 08 '10 at 02:53

answered Oct 08 '10 at 02:42

sje397

41,293
8
87
103

You mean `2/(n*(n-1))`. Better than you thought! – aschepler Oct 08 '10 at 02:49
How is random constant? The number of expected lookups into the array increases with the array size, which is also known as `O(n)` – matt b Oct 08 '10 at 03:10
I'm only picking 2 indexes, once. It doesn't usually work - that's the point. An O(1) algorithm that always works is impossible. – sje397 Oct 08 '10 at 03:13
ah, I was assuming you kept trying until you returned true (or checked every index) – matt b Oct 08 '10 at 03:14

sje397 · Answer 4 · 2010-10-08T03:53:03.727

1

O(n) without the temp array.

a[]={1,0,0,2,3};
i=0;
int required;
while(i<n)
{
  a[a[i] % n] += n;
  if(a[a[i] % n] >= 2 * n)
    required = a[i] % n;
}
print required;

(Assuming of course that n < MAX_INT - 2n)

edited Oct 08 '10 at 03:53

answered Oct 08 '10 at 03:45

sje397

41,293
8
87
103

I consider this temporary space. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 03:56
@R, how do you consider this to use temp space? – AShelly Oct 08 '10 at 13:45

score 1 · Answer 5 · edited Jan 17 '12 at 08:14

This example could be useful for int, char, and string.

char[] ch = { 'A', 'B', 'C', 'D', 'F', 'A', 'B' };
Dictionary<char, int> result = new Dictionary<char, int>();
foreach (char c in ch)
{
   if (result.Keys.Contains(c))
   {
       result[c] = result[c] + 1;
   }
   else
   {
       result.Add(c, 1);
   }
}
foreach (KeyValuePair<char, int> pair in result)
{
   if (pair.Value > 1)
   {
       Console.WriteLine(pair.Key);
   }
}
Console.Read();

卢声远 Shengyuan Lu · Answer 6 · 2010-10-08T04:51:20.317

0

Lazy solution: Put the elements to java.util.Set one by one by add(E) until getting add(E)==false.

Sorry no constant-time. HashMap:O(N), TreeSet:O(lgN * N).

edited Oct 08 '10 at 04:51

answered Oct 08 '10 at 02:41

卢声远 Shengyuan Lu

31,208
22
85
130

O(n) (assuming `java.util.Set` has O(1) cost.) – aaronasterling Oct 08 '10 at 02:42
@AaronMcSmooth - I don't see how it could be O(1) – sje397 Oct 08 '10 at 03:05
@sje, I was a little sloppy, O(1) insert cost. Python's sets and dicts have O(1) for average case so it's certainly possible. And I'm referring to time complexity because that's what OP originally said. It's obvious that this breaks the requirement for O(1) additional space. – aaronasterling Oct 08 '10 at 03:18
1

Depends on whether the Set implementation is a HashSet or a TreeSet. With a TreeSet, you have O(lg n) insert cost, but presumably here you would use a HashSet, which should have O(1) amortized insert cost, if tell it the expected capacity (which you know) and possibly the load factor (though the default likely will suffice) – James Oct 08 '10 at 04:07
@AaronMcSmooth O(1) per element, but you have to insert `n` elements. – Mark B Oct 08 '10 at 04:12
@Mark which means O(n) which is exactly what I said. – aaronasterling Oct 08 '10 at 04:14
Only Java fans would really *get into* discussing data structures that are nowhere near `O(1)` space as part of an in-place-computation puzzle... – R.. GitHub STOP HELPING ICE Oct 08 '10 at 04:15
@James: you are right, it should be depended on whether the Set implementation. – 卢声远 Shengyuan Lu Oct 08 '10 at 04:17
@R Leave it to non java fans to be egotistical about their language choice? This is a traditional question. My favorite (hacky) answer is still: for (int i = 0; i < array.length; i++) { if (array[i] < 0) { return array[i]; // Dupe, we already negated this one } array[i] = -array[i]; } Voila. Constant space, linear time. – James Oct 08 '10 at 04:34
@James: nope, that's not constant space. It's `O(n)` space because you're assuming your input buffer has extra unused writable bits that you can use as temporary space. Whatever the size of your data type is, I can make `n` large enough that you don't have an extra bit. – R.. GitHub STOP HELPING ICE Oct 08 '10 at 04:43
@R Damn you and your logic! I accept your argument, but only if you bother to define the original array as unsigned, which most of the people asking don't think to do in advance (hence it generally sufficing, since they're giving N between 0 and some number :) – James Oct 08 '10 at 04:55
Here's where I'm supposed to make another cheap stab at Java for not having unsigned types.... ;-) – R.. GitHub STOP HELPING ICE Oct 08 '10 at 05:14

Hogan · Answer 7 · 2010-10-08T03:19:59.070

0

Build a lookup table. Lookup. Done.

Non-temporary array solution:

Build lookup into gate array hardware, invoke.

edited Oct 08 '10 at 03:19

answered Oct 08 '10 at 03:03

Hogan

69,564
10
76
117

Wouldn't that involve a temporary array? – sje397 Oct 08 '10 at 03:13
Huge temporary storage for sure... but it is constant size! – Hogan Oct 08 '10 at 03:18

R.. GitHub STOP HELPING ICE · Answer 8 · 2010-10-08T04:23:08.470

The best I can do is O(n log n) in time and O(1) in space:

The basic idea is to perform a binary search of the values 0 through n-1, passing over the whole array of n elements at each step.

Initially, let i=0, j=n-1 and k=(i+j)/2.
On each run through the array, sum the elements whose values are in the range i to k, and count the number of elements in this range.
If the sum is equal to (k-i)*(k-i+1)/2 + i*(k-i+1), then the range i through k has neither the duplicate nor the omitted value. If the count of elements is less than k-i+1, then the range has the omitted value but not the duplicate. In either case, replace i by k+1 and k by the new value of (i+j)/2.
Else, replace j by k and k by the new value of (i+j)/2.
If i!=j, goto 2.

The algorithm terminates with i==j and both equal to the duplicate element.

(Note: I edited this to simplify it. The old version could have found either the duplicate or the omitted element, and had to use Vlad's difference trick to find the duplicate if the initial search turned up the omitted value instead.)

score 0 · Answer 9 · answered Oct 09 '10 at 01:56

Based on @sje's answer. Worst case is 2 passes through the array, no additional storage, non destructive.

O(n) without the temp array.

a[]={1,0,0,2,3};
i=0;
int required;
while (a[a[i] % n] < n)    
   a[a[i++] % n] += n;

required = a[i] % n;
while (i-->0)
   a[a[i]%n]-=n;

print required;

(Assuming of course that n < MAX_INT/2)

Puzzle : finding out repeated element in an Array

9 Answers9

Linked