0

What is the simplest way to remove duplicates from a C++ std::vector<std::string> ? I want the order to be kept.

For example:

std::vector<std::string> container;
container.push_back("z");
container.push_back("y");
container.push_back("x");
container.push_back("z");

And at the end, I simply want my vector to contain (in order) : z, y, x.

In order to remove the duplicates, I could simply add each vector item into a set/unordered_set, but it would modify the order based on the criterion of the default comparison object.

jean553
  • 621
  • 10
  • 29
  • 3
    The simplest way would be traversing the vector, adding the elements one by one to a set, and appending them to another vector only if not found in the set. – n. m. could be an AI Jan 27 '18 at 12:15

2 Answers2

1

A simple way is to iterate through the vector while keeping track of the elements encountered, and deleting those that have been encountered before.

Here is a piece of code that does exactly that.

std::unordered_set<std::string> encounters;
for (auto i = 0u; i < container.size(); ++i) {
    if (!encounters.insert(container[i]).second) {
        // The string was already in encounters
        container.erase(container.begin() + i);
        --i;
    }
}

Live on Coliru.

It could probably be optimized, for example by deleting ranges of elements when all are duplicates, or maybe by swapping each new element with the current first duplicate and, at the end, erasing the whole end of the vector that contains all the duplicates.

Nelfeal
  • 12,593
  • 1
  • 20
  • 39
  • Works perfectly well. Thanks. I guess you use std::unordered_set instead of std::set for performance reasons, right ? – jean553 Jan 27 '18 at 13:01
  • You could attribute the use of `unordered_set` to performance, but in this case the improvement is minimal. I just use `unordered_set` and `unordered_multiset` by default when I need to reason about the element values (in this case, uniqueness). `set` and `multiset` are useful for keeping a bunch of elements sorted. And even then, I often find myself making use of `vector` and `sort`. If you need performance for this particular problem, follow the link in Ron's comment. – Nelfeal Jan 27 '18 at 13:10
-1

you could create set and then iterate over vector, copy the elements from set into the vector, and than remove each element from set which is already copied. for ex:

std::vector<int> v{1,1,2,2,3,3};
std::set<int> s(v.begin(), v.end());
vector<int> v2(s.size());                  // v2 will contain unique elements 
                                           // from v in the same order
for (int i = 0, j=0; i < v.size(); ++i) {
    if (s.find(v[i]) != s.end()) {
       v2[j++] = v[i];
       s.erase(v[i]);
    }
}

v.assign(v2.begin(), v2.end());
StPiere
  • 4,113
  • 15
  • 24