Find duplicate in an array

Question

Given a read only array of n + 1 integers between 1 and n, find one number that repeats in linear time using less than O(n) space and traversing the stream sequentially O(1) times.

Sample Input: [3 4 1 4 1]
Sample Output : 1/4(any one of these)

If there are multiple possible answers (like in the sample case above), output any one.

If there is no duplicate, output -1.

I tried doing a solution of this which is:

int Solution::repeatedNumber(const vector<int> &A) {

    vector<bool> v(A.size(), true);

    for (int i = 0; i < A.size(); i++) {
        if (v[A[i]])
            v[A[i]] = false;
        else
            return A[i];
    }
}

This is getting accepted but how is this less than O(n) in memory?

This solution uses `O(n)` additional space. It's not less than `O(n)`. — François Andrieux, Jun 19 '17 at 18:44
Well you use bool instead of int, so maybe its just checking if memory used is < n*sizeof(int) ? I'm not sure, since it seems that you are right, the memory would grow at a rate of n. — Carl Shiles, Jun 19 '17 at 18:46
@Rabbid76 I'm not aware of any size guaranties for `std::set` but it's necessarily at least as big as the sum of it's elements. If you need to store up to `n` elements, then your memory requirement is at least `O(n)`. — François Andrieux, Jun 19 '17 at 18:58
@Rabbid76 `O(n)` and `O(n-1)` is the same complexity. And `O(n)` is a lower bound, `std::set` may take up more memory than that, as far as I know there are no guaranties provided by the standard. — François Andrieux, Jun 19 '17 at 19:01
By the way, although it seems that everybody here already has this in their mind, the return value can never be `-1`, since there are `n+1` inputs from integers `1` to `n`, what's known as pigeonhole principle. — nglee, Jun 20 '17 at 00:16

score 4 · Answer 1 · edited Jun 20 '20 at 09:12

You are correct in wondering why this would be accepted. This answer is obvious O(n) space complexity. You allocating some amount of data that grows directly proportionally with n, making it O(n) space. Whatever is judging your program is incorrectly accepting it. It may be possible that the judge is accepting your score because you are using less bytes than are allocated by A, but that is only speculation.

EDIT: The code bellow isn't actually a solution to the problem. It is a solution to a simpler problem along the lines of the above. The solution below ignores the constraint that the stream must be read only. After doing some research, it appears that this problem is a very difficult version of a series of similar problems of the type "Given a range of numbers between 1 and n, find the repeating/missing number". If there were only one number repeated, and there was only a O(n) time requirement, you could use a bool vector as above. If there were only one number repeated, but you were constrained to constant space, you could implement this solution where we use gauss's formula to find the sum of integers from 1 to n, and subtract that from the sum of the array. If the array had two missing numbers, and you were constrained to constant time, you could implement this solution where we use the sum and product of the array to create a system of equations which can be solved in O(n) time with O(1) space.

To solve the question posed above, it looks like one would have to implement something to the order of this monstrosity.

Here is a solution this problem within its constraints:

You could do something like this:

#include<vector>
#include<iostream>
int repeating(std::vector<int>& arr)
{
  for (int i = 0; i < arr.size(); i++)
  {
    if (arr[abs(arr[i])] >= 0)
      arr[abs(arr[i])] = -arr[abs(arr[i])];
    else {
      return abs(arr[i]);
    }
  }
}
int main()
{
        std::vector<int> v{1,2,3,4,5,1};

        std::cout<<repeating(v)<<std::endl;
        std::cout<<sizeof(v)*sizeof(v[0])<<std::endl;
        return 0;
}

The above program uses the input array itself to track duplicates. For each index i, the array evaluates arr[i]. The array sets arr(arr[i]) negative. Negating a value is an easily reversible operation (simply take the absolute value of the element), so it can be used to mark an index of the array without ruining the integrity of the data. If you ever encounter an index such that arr[abs(arr[i])] is negative, you know that you have seen abs(arr[i])) before in the array. This uses O(1) space complexity, traverses the array once, and can be modified to return any or all duplicate numbers.

It is O (1) space, not time complexity. That wasnt clear so I edited my answer. — Jayson Boubin, Jun 19 '17 at 20:07
i accidentally downvoted and can't take it back unless you edit it :( really sorry :( — seleciii44, Jun 19 '17 at 21:33
@seleciii44 I just saw some wording that needed updating anyway so I updated my answer — Jayson Boubin, Jun 19 '17 at 21:36
by the way, the function argument 'arr' should be a reference. otherwise the space complexity increases. — seleciii44, Jun 19 '17 at 22:18
Your suggestion uses O(n) memory, because you pass a copy of `v`. If you passed `v` by reference, it would violate the 'read-only' constraint of the question. — chtz, Jun 20 '17 at 10:55
@chtz You're correct. I edited my answer with an explanation. — Jayson Boubin, Jun 20 '17 at 13:30

score 3 · Answer 2 · answered Jun 19 '17 at 20:25

std::vector<bool> is a bitset, so it will use n bits. In Big-O notation, O(n/8)=O(n), that means the space is not less than O(n).

I assume they do not look at the actual program, but only measure its space consumption in some example runs. So, using a bit vector tricks it into believing that it is better than O(n).

But I agree with you. It should not be accepted.

score 1 · Answer 3 · answered Jul 12 '17 at 15:21

I have a solution which requires O(sqrt(N)) space and O(N) time, and traverses the list twice -- assuming it is possible to calculate the integer square root in O(1) time (for arbitrary large N, this is likely at least an O(log(N)) operation).

First allocate an integer array A1 of size ceil(sqrt(N)), filled with 0.
Iterate through your array, for each element x
- compute k=floor(sqrt(x))
- increment A1[k]
- If A1[k]>2k+1, there must be at least one duplicate between k² and (k+1)²-1. (For k=floor(sqrt(N)) the threshold is N-k²). Rememberk` and break first iteration
optionally delete first array
Allocate a boolean array A2 of size 2k+1 filled with false.
Iterate through all x again:
- Check if A2[x-k²] is set, if yes, x is a duplicate
- Otherwise, increment A2[x-k²]

The solution should also work for larger and smaller arrays (does not need to be exactly N+1), and if there are no duplicates, the first iteration will run to the end. Both temporary arrays are O(k) (if you are pedantic, the first one is O(k*log(k)), since it must store integers up to size sqrt(N)).

Upvoted for giving a solution to the problem presented - with minimal change from one of the naive approaches, at that. For an advanced O(1) additional memory solution not modifying the input, see [Finding a duplicate as a cycle entry-point](https://stackoverflow.com/a/48841544/3789665). — greybeard, Feb 20 '18 at 14:19

score 0 · Answer 4 · edited May 20 '19 at 18:25

0

std::vector<bool> isn't like any other vector.

std::vector<bool> is a possibly space-efficient specialization of std::vector for the type bool.

That's why it may use up less memory because it might represent multiple boolean values with one byte, like a bitset.

edited May 20 '19 at 18:25

xashru

3,400
2
17
30

answered May 20 '19 at 14:35

RDP

1
1

score 0 · Answer 5 · answered Aug 06 '22 at 13:51

solution suggested by @jayson Boubin in above answers is O(1)-space method and it is good(by the way itis awesome!! ), when changing the original array is allowed or mean changing doesn't matters. but if altering the original array is not allowed then the well-known solution is of O(sqrt(n))-space and O(n)-time, and this method basically suggests that we should first consider sqrt(n)-ranges whereas ith range will be [sqrt(n)*i to sqrt(n)*(i+1)] , and after that we traverse the array and count the no. of elements lies in for each range and so onn...

have a look on it: leetcode: find the duplicate number

score -1 · Answer 6 · answered Jun 19 '17 at 18:45

-1

Well it's constant (O(1)) in memory because you're simply doing a comparison in place, and not creating a new data structure to house anything or for any comparison.

You could also use a Hash Table like unordered_set, but that'd be using O(N) memory - but remain O(N) time complexity.

I'm not entirely sure if this was an "accepted" solution by the way (what you posted, because that is creating a vector of size (sizeofA) - but just offering a solution based on your needs.

answered Jun 19 '17 at 18:45

Travis

55
8

3

*and not creating a new data structure to house anything or for any comparison.* Then what is `vector v(A.size(),true);`? – NathanOliver Jun 19 '17 at 18:49
1

@Travis not the best start, believe me. – iehrlich Jun 19 '17 at 18:55
First, no need for name calling. Secondly, the statement *Well it's constant (O(1)) in memory because you're simply doing a comparison in place, and not creating a new data structure to house anything or for any comparison.* is wrong. The OP does create an additional data structure. – NathanOliver Jun 19 '17 at 18:55
@Travis NathanOliver is correct. The solution requires an allocation of `n` booleans, it's at least `O(n)` in memory. – François Andrieux Jun 19 '17 at 18:57
Okay, I was under the impression it strictly had to be less than O(N), not at LEAST. Edit: Also, I was not sure if the OP posted it. Not sure why I'm explaining this twice now though. I misunderstood the OP... – Travis Jun 19 '17 at 19:01
@Travis Make sure not to confuse the requirements of the solution provided by OP and the requirements supplied with the question OP was given. The question OP was trying to solve requires *less* than `O(n)`. The solution he posted has complexity `O(n)`, which shouldn't have been accepted but was. He is asking why it might have been accepted. – François Andrieux Jun 19 '17 at 19:04
This does provide an answer to the question, but a wrong one. – polkovnikov.ph Jun 20 '17 at 12:46

Find duplicate in an array

6 Answers6

Linked