22

I want to check if a vector of integers has any duplicates or not, and have to return true if it does. So I try to do something like this:

vector<int> uGuess = {1,2,3,3,4,5}
vector<int> a = uGuess;
sort(a.begin(), a.end());
bool d = unique(a.begin(), a.end());

And this will not work since unqiue cannot be assigned as a bool value. How should I proceed towards this? If I were to write a for loop to perform the same action, how should I do that?

boriaz50
  • 860
  • 4
  • 17
rrc
  • 275
  • 1
  • 3
  • 7
  • 2
    #myhomeworkonSO – Gabriel Sep 28 '17 at 20:29
  • 1
    `unique` doesn't do what you are describing you want to do – patatahooligan Sep 28 '17 at 20:30
  • 7
    `auto it = std::unique(a.begin(), a.end()); bool b = it == a.end();` – Jarod42 Sep 28 '17 at 20:30
  • Check a [reference](http://en.cppreference.com/w/cpp/algorithm/unique). – Rakete1111 Sep 28 '17 at 20:30
  • 1
    @Jarod42 doesn't work on non-sorted vectors – patatahooligan Sep 28 '17 at 20:33
  • 1
    @patatahooligan: In the example it is sorted as initialized, then it gets sorted. I think it's safe to say it's sorted. – Fred Larson Sep 28 '17 at 20:34
  • @FredLarson True, but OP's statement of the problem does not mention that the array is sorted. OP should clarify. – patatahooligan Sep 28 '17 at 20:35
  • 1
    @patatahooligan It doesn't matter, he calls `sort()` in his code, so then it's sorted. – Barmar Sep 28 '17 at 20:38
  • 1
    I kid you not, I somehow skipped over that line before. In that case @Jarod42 's comment could be the accepted answer. – patatahooligan Sep 28 '17 at 20:40
  • "How should I proceed towards this" depends on what **this** is. Don't post code that doesn't work and ask people to guess at what it's intended to do; **say it**. What should the value in `d` mean? – Pete Becker Sep 28 '17 at 20:48
  • 1
    Seems a duplicate of [Determining if an unordered vector has all unique elements](https://stackoverflow.com/questions/2769174/determining-if-an-unordered-vectort-has-all-unique-elements) and [Determine if there are duplicates in vector](https://stackoverflow.com/questions/49252730/determine-if-there-are-duplicates-in-vector). – Pablo H Apr 20 '20 at 22:47

7 Answers7

30

The algorithm you're looking for is std::adjacent_find.

// The container must be sorted!
const std::vector<int> sortedVector = {1,2,3,3,4,5};
const bool hasDuplicates = std::adjacent_find(sortedVector.begin(), sortedVector.end()) != sortedVector.end();

Unlike std::unique, std::adjacent_find doesn't modify the container.

As a bonus, std::adjacent_find returns an iterator to the first element in the duplicate "pair":

const auto duplicate = std::adjacent_find(sortedVector.begin(), sortedVector.end());

if (duplicate != sortedVector.end())
    std::cout << "Duplicate element = " << *duplicate << "\n";
Olivia Stork
  • 4,660
  • 5
  • 27
  • 40
Jan Holecek
  • 2,131
  • 1
  • 16
  • 26
19

You should using set

set<int> s(a.begin(), a.end());
return s.size() != a.size();
Giang
  • 2,384
  • 2
  • 25
  • 26
  • 1
    You mean `return s.size()==a.size();`, right? I like this approach because I do something similar in python, but do you have any idea how fast this is vs sorting? – snooze_bear Apr 17 '19 at 16:18
  • 2
    @snooze_bear This should be as fast as sorting but if you use `std::unordered_set` in place of `std::set` it can be faster as it is a hash set. – Sajal Apr 18 '21 at 09:20
  • why does it not have O(n) complexity where n = a.size() ? – Shubham Garg Oct 14 '21 at 10:41
  • @ShubhamGarg Late reply for late readers: Insertion complexity for `std::set` is `O(log(s))` with `s` being current size of the set – there are `n` elements, and `O(sum(i = 0 -> n) { log(i) })` (is that understandable?) remains `O(n*log(n))`... – Aconcagua Jun 29 '22 at 10:58
16

Looking at google for std::unique I found this page std::unique. I looked at what it did:

Eliminates all except the first element from every consecutive group of equivalent elements from the range [first, last)

So it looks like it does what you want - removes the duplicates.

I then looked at what it returns...

... returns a past-the-end iterator for the new logical end of the range

So the result from std::unique is a sequence which is not necessary the same as the whole vector.

If nothing was removed, the return value would be the end of the vector.

So you want:

vector<int>::iterator it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Or for C++11:

auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Finally for the unique function to work, the vector needs to be sorted, so the complete code would be:

sort(a.begin(), a.end());
auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());
Olivia Stork
  • 4,660
  • 5
  • 27
  • 40
mksteve
  • 12,614
  • 3
  • 28
  • 50
  • 2
    _"So it looks like it does what you want - removes the duplicates."_ What?! Where in the world does the OP ask for removing duplicates? This answer is wrong as `std::unique` modifies the vector. – Zakk Jun 29 '22 at 10:10
  • @Zakk Modifying a *duplicate vector* seems fine – QA tried that her-/himself, too, but didn't manage to apply the correct checks afterwards... – Aconcagua Jun 29 '22 at 11:04
  • @Aconcagua My comment was about removing the duplicates. That was not what the OP asked for. – Zakk Jun 29 '22 at 15:11
  • @Zakk Not directly – but the initial idea has been since right from the start removing duplicates from a copy and then compare sizes – if differing, removal took place thus duplicates are still available in original unchanged vector – solely the appropriate test for removal detection has been lacking... – Aconcagua Jun 30 '22 at 07:26
4

If someone is forced to write own algorithm:

bool hasDuplicates(const std::vector<int>& arr) {
    for (std::size_t i = 0; i < arr.size(); ++i) {
        for (std::size_t j = i + 1; j < arr.size(); ++j) {
            if (arr[i] == arr[j])
                return true;
        }
    }
    return false;
}

But in real code you should use things that already exist, and in the standard library.

boriaz50
  • 860
  • 4
  • 17
  • 1
    That leaves the question: what function in the STL is the best fit for this problem? – Ben Jones Jan 27 '21 at 22:00
  • 1
    I believe this will have bad performance for large vectors due to caches. If you use `for (std::size_t j = 0; j < i; ++j)` you test the small ranges first, which fits in cache, and only at the end do slow searches outside of cache. It's likely to terminate early if there is a duplicate and never reach the slow part. – Goswin von Brederlow Jun 29 '22 at 10:44
2

Sort the vector if it's not already sorted, and then use std::unique(), like this:

#include <iostream>
#include <vector>
#include <algorithm>

int main() {
    std::vector<int> v = {3, 1, 3, 4, 5};
    sort(v.begin(), v.end());
    auto it = std::unique(v.begin(), v.end());
    std::cout << ((it == v.end()) ? "Unique\n" : "Duplicate(s)\n");
    return 0;
}

Output:

Duplicate(s)

gsamaras
  • 71,951
  • 46
  • 188
  • 305
1

So far all these solutions either modify the container or have O(n²) complexity. You can put a std::map to much better use:

#include <algorithm>
#include <iterator>
#include <map>

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator last )
{
  std::map <typename std::iterator_traits <Iterator> ::value_type, std::size_t> histogram;

  while (first != last)
    if (++histogram[ *first++ ] > 1) 
      return true;

  return false;
}

#include <iostream>
#include <vector>

int main()
{
  using std::begin;
  using std::end;

  int a[] = { 2, 3, 5, 7, 11 };
  int b[] = { 2, 3, 5, 5, 7 };

  std::vector <int> c( begin(a), end(a) );
  std::vector <int> d( begin(b), end(b) );

  std::cout << std::boolalpha;
  std::cout << "a has duplicates false : " << has_duplicates( begin(a), end(a) ) << "\n";
  std::cout << "b has duplicates true  : " << has_duplicates( begin(b), end(b) ) << "\n";
  std::cout << "c has duplicates false : " << has_duplicates( begin(c), end(c) ) << "\n";
  std::cout << "d has duplicates true  : " << has_duplicates( begin(d), end(d) ) << "\n";
}
Dúthomhas
  • 8,200
  • 2
  • 17
  • 39
  • 1
    It will use more memory and not be faster than sorting a vector. – Sopel Oct 06 '17 at 17:04
  • Yes, but sorting a vector _modifies_ the original content. – Dúthomhas Oct 06 '17 at 21:14
  • Of course. Look, I don’t think you understand the algorithm here and/or have completely ignored its termination conditions. For a large vector of completely unique elements, the memory requirements are not quite as friendly, but there is a trade-off _entirely dependent on OP’s data and his operating requirements_. Why not give OP all the options instead of blindly clunking along ‘sort a copy and write your own test’ for a very-likely minor memory issue when you could be using the power of the Standard Library to efficiently histogram for you? – Dúthomhas Oct 07 '17 at 12:03
  • 1
    Then you should have specified in the answer that this solution is only good if the number of unique elements is relatively small (or less probabilistically, if a duplicate is close to the beginning of the array). – Sopel Oct 07 '17 at 12:12
  • No, it is only _bad_ if the source data is particularly large _and_ unlikely to have any duplicates. Please quit making unprovable generalized statements about probability and requirements. (If your data set is more likely to have a duplicate near the end, then reverse-walk your source, LOL.) – Dúthomhas Oct 07 '17 at 12:26
1

If your vector is small, say < 32 objects, or if copying and sorting the objects is expensive or impossible due to lack of move or copy constructor/assignment then a straight O(n^2) compare everything against everything else algorithm is the way to go.

Here is my solution:

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator end ) {
    for (auto i = first; i != end; ++i) {
        for (auto j = first; i != j; ++j) {
            if (*i == *j) return true;
        }
    }
    return false;
}

template <typename Container>
bool has_duplicates(const Container &v) {
    for (const auto & i : v) {
        for (const auto & j : v) {
            if (&i == &j) break;
            if (i == j) return true;
        }
    }
    return false;
}
Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42