erase duplicate elements keeping order

Question

I would like to eliminate duplicate elements from a vector while keeping the vector in the current order.

Below I have a proposed implementation. First, is this safe?

Second, is there a better way to do it, either more efficiently or better from a "using C++ algorithms and not reinventing the wheel" perspective.

#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>

int main()
{
    using namespace std;

    std::vector<int> v= {1, 7, 2, 3, 8, 4, 5, 3, 2, 3, 2, 6, 2, 3, 2, 9, 10, 1, 2, 2, 1};
    std::vector<int>::iterator finalEnd = v.end();
    for (auto vIter = v.begin(); vIter != v.end(); ++vIter) {
        for (auto nextvIter = vIter + 1; nextvIter != v.end(); ++nextProjIter) {
            if (*vIter == *nextvIter)
                finalEnd = std::remove(vIter, finalEnd, *nextvIter);
        }
    }
    v.erase(finalEnd, v.end());

    for(auto p : v)
        cout << p << "  ";

    //Should return:  1  7  2  3  8  4  5  6  9  10

    return EXIT_SUCCESS;
}

No, I'm afraid the shown logic is fatally flawed and has certain edge cases where it will fail. And, yes, to answer the question that's being asked, yes, there is a much better way to do it, using a set or an unordered_set to keep track of duplicates. Should take about half as much code, and be much simpler, using only one loop. See your C++ textbook for more information on using sets and other associative containers. — Sam Varshavchik, Dec 18 '20 at 11:53
Use std::set if possible. Elements are ordered and duplicates are not possible (at least for POD types from scratch). — Secundi, Dec 18 '20 at 11:54

PaulMcKenzie · Accepted Answer · 2020-12-18T17:23:09.643

One of many ways this can be accomplished it to use std::unordered_set to keep track of duplicates and std::stable_partition to partition the duplicates from the lone values while preserving the order of the items:

#include <iostream>
#include <vector>
#include <algorithm>
#include <unordered_set>

int main()
{
    std::unordered_set<int> numSet;
    std::vector<int> v= {1, 7, 2, 3, 8, 4, 5, 3, 2, 3, 2, 6, 2, 3, 2, 9, 10, 1, 2, 2, 1};
    auto iter = std::stable_partition(v.begin(), v.end(), [&](int n) 
           { bool ret = !numSet.count(n); numSet.insert(n); return ret; }); // returns true if the item has not been "seen"
    v.erase(iter, v.end());           
    for(auto p : v)
        std::cout << p << "  ";
}

Output:

1  7  2  3  8  4  5  6  9  10

The std::stable_partition will return true if the item has not been seen, thus place it to the left of the partition point. Once done, an iterator to the partition point is returned, and we use this iterator to do one single erasure from that point to the end of the vector. Note that the lambda function updates the unordered_set for each item processed.

The reason why std::stable_partition was used instead of std::remove_if is that std::remove_if is not guaranteed to process the items in order. For example, it could have been possible for an implementation to process the second 1 in that data first, instead of the first 1. So to be safe stable_partition will not erase elements, but simply place the elements in the correct position, ready for the erasure at the end.

Paul, this works well and answers the question. One more thing: in my desire to simplify the problem, I used int as the datatype in the original question. In fact, my data is of boost::filesystem::path objects. When I use the above function on a vector of path objects, I get errors in MSVC 2019, specifically 'std::hash<_Kty>::hash(const std::hash<_Kty> &)': attempting to reference a deleted function. I realize this isn't part of the original post, but if you had any suggestion to this, I'd be appreciative. — Madeleine P. Vincent, Dec 18 '20 at 13:14
The issue is probably that the objects for a `std::unordered_set` requires that they have a requisite hash function (you will need to provide it as the second template argument to `unordered_set`. Since you are using boost, there is [boost::hash](https://www.boost.org/doc/libs/1_75_0/doc/html/hash/custom.html). Maybe create a hash on the full path name would be one way to devise such a hash for filesystem path's. — PaulMcKenzie, Dec 18 '20 at 17:21

tango-2 · Answer 2 · 2020-12-18T12:23:09.330

By constructing a new vector, you can initialize this vector to be non-duplicate. You can use the find function for this. I suggest you search for std :: find

std::vector<int> v= {1, 7, 2, 3, 8, 4, 5, 3, 2, 3, 2, 6, 2, 3, 2, 9, 10, 1, 2, 2, 1};
std::vector<int> nonDuplicateVect;

for (int element : v)
    if(std::find(nonDuplicateVect.begin(), nonDuplicateVect.end(), element) == nonDuplicateVect.end())
        nonDuplicateVect.push_back(element);

for (int element : nonDuplicateVect)
    std::cout << element << " ";

std::cout << "\n";

erase duplicate elements keeping order

2 Answers2