Return true for duplicate elements

Question

I want this boolean function to return true if the array contains any duplicate element and false if it doesn't. My code is:-

bool containsDuplicate(vector<int>& nums) {
        int flag=0;
        for(int i=0;i<nums.size()/2;i++)
        {
            for(int j=1;j<nums.size();j++)
            {
                if(nums[i]==nums[j])
                {
                    flag=1;
                }
            }
        }
    if(flag==1)
        return true;
    else
        return false;
    
    }

It works for arrays containing duplicates but it is not returning false in the case of arrays having unique elements.

Offtopic: Please don't write `if(condition) return true; else return false;` but simply `return condition;` instead. — Aconcagua, Jun 29 '22 at 10:12
In your case you could drop the `flag` entirely and just return from the function prematurely if the condition is met: `for() { for() { if() { return true; } } } return false;` – avoids needlessly continuing iteration on having found a duplicate already. — Aconcagua, Jun 29 '22 at 10:14

Marek R · Answer 1 · 2022-06-29T10:43:26.757

5

Your solution has time complexity O(n^2). It will be great for small data, but for larger datasets it will be slow.

Extra data structure can greatly improve speed:

bool containsDuplicate(const vector<int>& nums) {
    std::unordered_set<int> seen;
    for (auto x : nums) {
        if (!seen.insert(x).second) return false;
    }
    return true;
}

std::unordered_set<Key,Hash,KeyEqual,Allocator>::insert - cppreference.com

Return value

1-2) Returns a pair consisting of an iterator to the inserted element (or to the element that prevented the insertion) and a bool denoting whether the insertion took place.

Since inserting into std::unordered_set has O(1) time complexity, total time complexity of this solution is O(n). Performance maniac could add arbitrary constant which will determine which algorithm use depending on data size (as mentioned your approach will be great for small data).

edited Jun 29 '22 at 10:43

answered Jun 29 '22 at 10:35

Marek R

32,568
6
55
140

Oh, like that one – the price for is additional memory, but in generally worth it (unless perhaps on a micro controller...) – Aconcagua Jun 29 '22 at 10:47
A potential O(n\*log(n)) alternative is sorting the vector (in place!) – *if* order of contents is irrelevant anyway, of course... – Aconcagua Jun 29 '22 at 10:50
Yes assuming that you can modify data. – Marek R Jun 29 '22 at 10:58
Oh, return values should be *inverse*: `containsDuplicate` -> insertion failes -> there's a duplicate -> `return true`... – Aconcagua Jun 29 '22 at 11:07
4

BTW this is not guaranteed to be the fastest. See the two answers: **[What is the fastest way to see if an array has two common elements?](https://stackoverflow.com/questions/69066787/what-is-the-fastest-way-to-see-if-an-array-has-two-common-elements)** – JeJo Jun 29 '22 at 11:11
Fully correct would be: *'[...] has amortised O(1) time complexity'* to account for potential re-allocations occur – [`reserve`ing](https://en.cppreference.com/w/cpp/container/unordered_set/reserve) would avoid that. – Aconcagua Jun 29 '22 at 11:13
1

@JeJo yes I've broken "Measure First" rule :/. – Marek R Jun 29 '22 at 12:09
Can you solve it in linear time and space? – UTKARSH LUCIFER Jun 29 '22 at 17:13
@UTKARSHLUCIFER this is linear in time and space. Did you mean in constant space complexity? – Marek R Jun 29 '22 at 17:58

Caleth · Answer 2 · 2022-06-29T10:33:37.477

2

You are testing the first half of elements against themselves, and not testing duplicates in the second half. You should instead test every element against later elements

bool containsDuplicate(const vector<int>& nums) {
    for(int i=0;i<nums.size();i++)
    {
        for(int j=i+1;j<nums.size();j++)
        {
            if(nums[i]==nums[j])
            {
                return true;
            }
        }
    }
    return false;   
}

edited Jun 29 '22 at 10:33

answered Jun 29 '22 at 10:08

Caleth

52,200
2
44
75

Though I personally prefer `for(i = 1; ...) { for(j = 0; j < i; ...) { ... } }` for not having to call `size` function again and again in inner loop at least ;) – Aconcagua Jun 29 '22 at 10:15
3

@Aconcagua I'd leave that sort of thing to the optimiser – Caleth Jun 29 '22 at 10:17
Why? If we can write more efficient code with even less pain (mine is shorter... – and the pain: I left the call to `size` in the outer loop for exactly that reason...)? And how far would we want to rely on the optimiser at all then? My favourite (counter-) example: Update only existing: `if(map.find(key) { map[key] = newVal; }` instead of re-using the iterator returned... – Aconcagua Jun 29 '22 at 10:27
@Aconcagua "that sort of thing", not "everything". I'm reasonably confident that hoisting constants above a loop is one of the things that the implementations I use do. I'm not confident that they merge map lookups – Caleth Jun 29 '22 at 10:31
I can't proof it but I have the (probably wrong) feeling that `for(j = 0; j < i; ++j)` is more likely to find a duplicate early. Or at least faster because it starts with small loops where all data fits in cache and can cover a big chunk of the search space before before it exceeds L1, L2, L3 cache. The other way around it starts of exceeding caches and then slowly gets down to fit in L3, L2, L1 caches. And if it exits early (likely if there is a duplicate) it's best to start fast. – Goswin von Brederlow Jun 29 '22 at 10:34
Maybe I should have argumented *'for its brevity'* right from the start ;) Never mind. My personal attitude is: Don't rely on the optimiser *unless* not doing so would lead to more complicated code – *then* we're in the field of premature optimisations... – Aconcagua Jun 29 '22 at 10:35
How about: `for (const auto & i : nums) { for (const auto & j : nums) { if (&i == &j) break; if (i == j) return true; } } return false;`? – Goswin von Brederlow Jun 29 '22 at 10:37
1

Well if I were writing this it'd be `template bool containsDuplicates(ForwardIt first, ForwardIt last)` and there'd be no `size()` calls – Caleth Jun 29 '22 at 10:37
1

@GoswinvonBrederlow or `(&i != &j) && (i == j)` – Caleth Jun 29 '22 at 10:38
@Caleth the `break` exits the inner loop when i and j are the same object. It's the `for(j = 0; j < i; ++j)` written in an odd way. Your case would only skip the same-object case but not break the loop. – Goswin von Brederlow Jun 29 '22 at 10:38
@GoswinvonBrederlow About finding early: Duplicates being on long distances (more than half the data length or so) then then original variant is in advantage. Amortised over arbitrary data with both short and long distances the caching likely comes into play... – Aconcagua Jun 29 '22 at 10:41
@GoswinvonBrederlow The range based approach results in many duplicate checks, it would correspond to `for(i = 0; i < num; ++i) { for(j = 0; j < num; ++j) { } }` but iterator-based... – Aconcagua Jun 29 '22 at 10:44
@Aconcagua No it doesn't. It breaks the inner loop so every comparison is unique. It's the same as: `for(i = 0; i < num; ++i) { for(j = 0; j < num; ++j) { if (i == j) break; ... } }` At least that's the design goal, might be buggy. – Goswin von Brederlow Jun 29 '22 at 10:53
I often wish for an iterator/index based range loop, maybe like: `for (auto *it : nums)`, where `it` is an iterator instead of a value. – Goswin von Brederlow Jun 29 '22 at 10:57

Return true for duplicate elements

2 Answers2

Return value