Check std::vector has duplicates

Question

I want to check if a vector of integers has any duplicates or not, and have to return true if it does. So I try to do something like this:

vector<int> uGuess = {1,2,3,3,4,5}
vector<int> a = uGuess;
sort(a.begin(), a.end());
bool d = unique(a.begin(), a.end());

And this will not work since unqiue cannot be assigned as a bool value. How should I proceed towards this? If I were to write a for loop to perform the same action, how should I do that?

`auto it = std::unique(a.begin(), a.end()); bool b = it == a.end();` — Jarod42, Sep 28 '17 at 20:30
Check a [reference](http://en.cppreference.com/w/cpp/algorithm/unique). — Rakete1111, Sep 28 '17 at 20:30
@patatahooligan: In the example it is sorted as initialized, then it gets sorted. I think it's safe to say it's sorted. — Fred Larson, Sep 28 '17 at 20:34
@FredLarson True, but OP's statement of the problem does not mention that the array is sorted. OP should clarify. — patatahooligan, Sep 28 '17 at 20:35
@patatahooligan It doesn't matter, he calls `sort()` in his code, so then it's sorted. — Barmar, Sep 28 '17 at 20:38
I kid you not, I somehow skipped over that line before. In that case @Jarod42 's comment could be the accepted answer. — patatahooligan, Sep 28 '17 at 20:40
"How should I proceed towards this" depends on what **this** is. Don't post code that doesn't work and ask people to guess at what it's intended to do; **say it**. What should the value in `d` mean? — Pete Becker, Sep 28 '17 at 20:48
Seems a duplicate of [Determining if an unordered vector has all unique elements](https://stackoverflow.com/questions/2769174/determining-if-an-unordered-vectort-has-all-unique-elements) and [Determine if there are duplicates in vector](https://stackoverflow.com/questions/49252730/determine-if-there-are-duplicates-in-vector). — Pablo H, Apr 20 '20 at 22:47

score 30 · Answer 1 · edited Jan 27 '22 at 15:11

The algorithm you're looking for is std::adjacent_find.

// The container must be sorted!
const std::vector<int> sortedVector = {1,2,3,3,4,5};
const bool hasDuplicates = std::adjacent_find(sortedVector.begin(), sortedVector.end()) != sortedVector.end();

Unlike std::unique, std::adjacent_find doesn't modify the container.

As a bonus, std::adjacent_find returns an iterator to the first element in the duplicate "pair":

const auto duplicate = std::adjacent_find(sortedVector.begin(), sortedVector.end());

if (duplicate != sortedVector.end())
    std::cout << "Duplicate element = " << *duplicate << "\n";

score 19 · Answer 2 · answered Feb 17 '19 at 12:26

19

You should using set

set<int> s(a.begin(), a.end());
return s.size() != a.size();

answered Feb 17 '19 at 12:26

Giang

2,384
2
25
26

1

You mean `return s.size()==a.size();`, right? I like this approach because I do something similar in python, but do you have any idea how fast this is vs sorting? – snooze_bear Apr 17 '19 at 16:18
2

@snooze_bear This should be as fast as sorting but if you use `std::unordered_set` in place of `std::set` it can be faster as it is a hash set. – Sajal Apr 18 '21 at 09:20
why does it not have O(n) complexity where n = a.size() ? – Shubham Garg Oct 14 '21 at 10:41
@ShubhamGarg Late reply for late readers: Insertion complexity for `std::set` is `O(log(s))` with `s` being current size of the set – there are `n` elements, and `O(sum(i = 0 -> n) { log(i) })` (is that understandable?) remains `O(n*log(n))`... – Aconcagua Jun 29 '22 at 10:58

score 16 · Accepted Answer · edited Jan 27 '22 at 15:16

16

Looking at google for std::unique I found this page std::unique. I looked at what it did:

Eliminates all except the first element from every consecutive group of equivalent elements from the range [first, last)

So it looks like it does what you want - removes the duplicates.

I then looked at what it returns...

... returns a past-the-end iterator for the new logical end of the range

So the result from std::unique is a sequence which is not necessary the same as the whole vector.

If nothing was removed, the return value would be the end of the vector.

So you want:

vector<int>::iterator it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Or for C++11:

auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Finally for the unique function to work, the vector needs to be sorted, so the complete code would be:

sort(a.begin(), a.end());
auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

edited Jan 27 '22 at 15:16

Olivia Stork

4,660
5
27
40

answered Sep 28 '17 at 20:37

mksteve

12,614
3
28
50

2

_"So it looks like it does what you want - removes the duplicates."_ What?! Where in the world does the OP ask for removing duplicates? This answer is wrong as `std::unique` modifies the vector. – Zakk Jun 29 '22 at 10:10
@Zakk Modifying a *duplicate vector* seems fine – QA tried that her-/himself, too, but didn't manage to apply the correct checks afterwards... – Aconcagua Jun 29 '22 at 11:04
@Aconcagua My comment was about removing the duplicates. That was not what the OP asked for. – Zakk Jun 29 '22 at 15:11
@Zakk Not directly – but the initial idea has been since right from the start removing duplicates from a copy and then compare sizes – if differing, removal took place thus duplicates are still available in original unchanged vector – solely the appropriate test for removal detection has been lacking... – Aconcagua Jun 30 '22 at 07:26

boriaz50 · Answer 4 · 2017-10-06T16:59:17.807

4

If someone is forced to write own algorithm:

bool hasDuplicates(const std::vector<int>& arr) {
    for (std::size_t i = 0; i < arr.size(); ++i) {
        for (std::size_t j = i + 1; j < arr.size(); ++j) {
            if (arr[i] == arr[j])
                return true;
        }
    }
    return false;
}

But in real code you should use things that already exist, and in the standard library.

edited Oct 06 '17 at 16:59

answered Sep 28 '17 at 21:12

boriaz50

860
4
17

1

That leaves the question: what function in the STL is the best fit for this problem? – Ben Jones Jan 27 '21 at 22:00
1

I believe this will have bad performance for large vectors due to caches. If you use `for (std::size_t j = 0; j < i; ++j)` you test the small ranges first, which fits in cache, and only at the end do slow searches outside of cache. It's likely to terminate early if there is a duplicate and never reach the slow part. – Goswin von Brederlow Jun 29 '22 at 10:44

gsamaras · Answer 5 · 2021-07-31T15:19:29.803

2

Sort the vector if it's not already sorted, and then use std::unique(), like this:

#include <iostream>
#include <vector>
#include <algorithm>

int main() {
    std::vector<int> v = {3, 1, 3, 4, 5};
    sort(v.begin(), v.end());
    auto it = std::unique(v.begin(), v.end());
    std::cout << ((it == v.end()) ? "Unique\n" : "Duplicate(s)\n");
    return 0;
}

Output:

Duplicate(s)

edited Jul 31 '21 at 15:19

answered Sep 28 '17 at 21:18

gsamaras

71,951
46
188
305

1

Here, the original vector gets changed. So, this can't be the useful method. – abhi3700 Feb 10 '21 at 06:42
@gsamaras The input vector has to be sorted. Am I right? – John Jul 31 '21 at 09:06
Yes the vector needs to be sorted, answer updated! Thanks @John for the comment and the upvote! – gsamaras Jul 31 '21 at 15:20

score 1 · Answer 6 · answered Sep 28 '17 at 21:28

1

So far all these solutions either modify the container or have O(n²) complexity. You can put a std::map to much better use:

#include <algorithm>
#include <iterator>
#include <map>

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator last )
{
  std::map <typename std::iterator_traits <Iterator> ::value_type, std::size_t> histogram;

  while (first != last)
    if (++histogram[ *first++ ] > 1) 
      return true;

  return false;
}

#include <iostream>
#include <vector>

int main()
{
  using std::begin;
  using std::end;

  int a[] = { 2, 3, 5, 7, 11 };
  int b[] = { 2, 3, 5, 5, 7 };

  std::vector <int> c( begin(a), end(a) );
  std::vector <int> d( begin(b), end(b) );

  std::cout << std::boolalpha;
  std::cout << "a has duplicates false : " << has_duplicates( begin(a), end(a) ) << "\n";
  std::cout << "b has duplicates true  : " << has_duplicates( begin(b), end(b) ) << "\n";
  std::cout << "c has duplicates false : " << has_duplicates( begin(c), end(c) ) << "\n";
  std::cout << "d has duplicates true  : " << has_duplicates( begin(d), end(d) ) << "\n";
}

answered Sep 28 '17 at 21:28

Dúthomhas

8,200
2
17
39

1

It will use more memory and not be faster than sorting a vector. – Sopel Oct 06 '17 at 17:04
Yes, but sorting a vector _modifies_ the original content. – Dúthomhas Oct 06 '17 at 21:14
Of course. Look, I don’t think you understand the algorithm here and/or have completely ignored its termination conditions. For a large vector of completely unique elements, the memory requirements are not quite as friendly, but there is a trade-off _entirely dependent on OP’s data and his operating requirements_. Why not give OP all the options instead of blindly clunking along ‘sort a copy and write your own test’ for a very-likely minor memory issue when you could be using the power of the Standard Library to efficiently histogram for you? – Dúthomhas Oct 07 '17 at 12:03
1

Then you should have specified in the answer that this solution is only good if the number of unique elements is relatively small (or less probabilistically, if a duplicate is close to the beginning of the array). – Sopel Oct 07 '17 at 12:12
No, it is only _bad_ if the source data is particularly large _and_ unlikely to have any duplicates. Please quit making unprovable generalized statements about probability and requirements. (If your data set is more likely to have a duplicate near the end, then reverse-walk your source, LOL.) – Dúthomhas Oct 07 '17 at 12:26

score 1 · Answer 7 · answered Jun 29 '22 at 10:51

If your vector is small, say < 32 objects, or if copying and sorting the objects is expensive or impossible due to lack of move or copy constructor/assignment then a straight O(n^2) compare everything against everything else algorithm is the way to go.

Here is my solution:

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator end ) {
    for (auto i = first; i != end; ++i) {
        for (auto j = first; i != j; ++j) {
            if (*i == *j) return true;
        }
    }
    return false;
}

template <typename Container>
bool has_duplicates(const Container &v) {
    for (const auto & i : v) {
        for (const auto & j : v) {
            if (&i == &j) break;
            if (i == j) return true;
        }
    }
    return false;
}

Check std::vector has duplicates

7 Answers7

Linked

Related