4

I have two vectors, one is a vector of indices of the other vector that I would like to erase. Currently I am doing the following:

#include <vector>
#include <iostream>
#include <string>

int main() {
        std::vector<std::string> my_vec;
        my_vec.push_back("one");
        my_vec.push_back("two");
        my_vec.push_back("three");
        my_vec.push_back("four");
        my_vec.push_back("five");
        my_vec.push_back("six");

        std::vector<int> remove_these;
        remove_these.push_back(0);
        remove_these.push_back(3);

        // remove the 1st and 4th elements
        my_vec.erase(my_vec.begin() + remove_these[1]);
        my_vec.erase(my_vec.begin() + remove_these[0]);

        my_vec.erase(remove_these.begin(), remove_these.end());

        for (std::vector<std::string>::iterator it = my_vec.begin(); it != my_vec.end(); ++it)
                std::cout << *it << std::endl;

        return 0;
}

But I think this is inelegant and inefficient. Further, I think I have to be careful to sort my remove_these vector and start from the end (that's why I erase index 3 before index 0). I would like to have one erase command, something like

my_vec.erase(remove_these.begin(), remove_these.end());

But of course that won't work because my_vec.erase() expects iterators referring to the same vector.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
Xu Wang
  • 10,199
  • 6
  • 44
  • 78
  • What's your deletion criteria? Do you want to erase random elements? – K-ballo Dec 30 '12 at 20:08
  • 2
    if you want to remove many elements from a vector by indices, perhaps it would be easier to build a new vector with the ones you wish to keep instead. – didierc Dec 30 '12 at 20:13
  • @didierc: You should make an answer out of that. The current answers are, in my opinion, incomplete or not guaranteed to work on all implementations. – Benjamin Lindley Dec 30 '12 at 20:33
  • @BenjaminLindley done! Let me know (or make an edit), if you think it deserve some readjustment. – didierc Dec 30 '12 at 22:40

3 Answers3

6

A known idiom to erase elements from a standard sequence is the erase/remove idiom. You first call the remove algorithm that will move all the elements you want to keep to the front of the sequence, then you erase the removed elements at the back of your sequence. In C++11 it looks like this:

std::vector< std::string > strings;
strings.erase(
    std::remove_if(
        strings.begin(), strings.end()
      , []( std::string const& s ) -> bool
        {
            return /*whether to remove this item or not*/;
        }
    )
  , strings.end()
);
K-ballo
  • 80,396
  • 20
  • 159
  • 169
4
    std::sort(remove_these.begin(), remove_these.end());

    int counter = 0;
    auto end = std::remove_if(my_vec.begin(), my_vec.end(),
                             [&](const std::string&) mutable {
        return std::binary_search(remove_these.begin(), remove_these.end(), counter++);
    });
    my_vec.erase(end, my_vec.end());

This uses remove_if with a lambda function that returns true if the index of the current element (tracked by the variable counter) is found in the vector remove_these. That vector is sorted so that binary_search can be used, as an optimisation. If the list of elements to remove is small it might be faster to not sort it and just use this in the lambda:

        return std::find(remove_these.begin(), remove_these.end(), counter++) != remove_these.end();
Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • `remove_these` is a `vector` so this wouldn't work as is – K-ballo Dec 30 '12 at 20:15
  • Sorry, yes this would work, assuming that `remove_if` will call your elements sequentially. Is there anything in the standard that prevents a `remove_if` in a `vector` from say going from _end_ to _begin_ instead? – K-ballo Dec 30 '12 at 20:17
  • The fact it works with forward iterators. (Technically there could be a specialization for random access iterators that worked backwards, but it would be a bit strange. A parallelisable `remove_if` might not work linearly though.) – Jonathan Wakely Dec 30 '12 at 20:21
  • Hmm, working backwards could have a performance advantage because you only move elements you have already visited and know you want to keep, you don't waste time moving elements that might later be removed. I'll have to investigate that ... – Jonathan Wakely Dec 30 '12 at 20:28
  • That was what I had in mind when I asked. I'm unsure about what the standard says about the implementation adding that kind of overloads... – K-ballo Dec 30 '12 at 20:30
  • Does the lambda have a return value..? – David G Dec 30 '12 at 20:50
  • @David: Simple lambdas that consist of a single `return` statement don't need an explicit return value. – K-ballo Dec 30 '12 at 21:26
  • Thanks Jonathan, I learned a lot from this example and from the comments on this answer. – Xu Wang Jan 04 '13 at 22:24
1

In your situation, I think that there are two problems worth being taken in account:

  • you are using a container with contiguous indices, so that every time an element is removed, all the elements after it get reindexed (and that's the reason why you had to do the removal in reverse order in your sample code),
  • that container also happens to store its elements contiguously, so that any deletion may trigger a reallocation, and at least provoque a copy of elements to satisfy the continuousness constraint.

Given these two issues, it might be interesting in some cases to copy the elements you wish to keep over to a new container, rather than do the removal. In your case, it appears that copying the elements should not be a big concern, as many implementations of std::string use a copy on write strategy, but you might want to verify that with your own.

Another thing to consider is that the set of indices to be removed can be nicely stored in a bit vector. It's fairly efficient, and simplify the algorithm significantly. You'll need to keep track of the effective number of elements removed though.

I would personally go for a simple loop, but C++ offers many ways to achieve a similar result. Here's the loop version:

    std::vector<bool> remove_these(my_vec.size(), false):
    remove_these[0] = remove_these[4] = true;

    std::vector<std::string> my_result;
    my_result.reserve(my_vec.size() - 2);

    for (int i = 0; i < remove_these.size(); ++i)
        if (!remove_these[i])
             my_result.push_back(my_vec[i]);

Note the use of reserve to avoid multiple reallocations during the vector filling.

Now, all is left to do is wrap the above code in a function which will transform beforehand the int vector in a bool vector:

template <typename IT>
void erase(std::vector<std::string> &my_vec, IT begin, IT end){
    std::vector<std::string> res;
    std::vector<bool> r(my_vec.size(), false);
    res.reserve(my_vec.size() - (end - begin));
    for (IT it = begin; it != end; ++it)
        r[*it] = true;
    for (int i = 0; i < r.size(); ++i)
        if (!r[i])
            res.push_back(my_vec[i]);
    my_vec = res;
}

That would be it. The time complexity of the algorithm is about O(N+M), where N and M are the sizes of my_vec and remove_these. Alternatively, one could replace the second loop with a remove_if.

In fact, if the stl provided a function to iterate over a sequence like remove_if and call a predicate function taking in parameter the key and value of that iterator, we could use it by feeding it with my_vec reverse iterators, and a lambda to check if the given key is in remove_these, but the time complexity would be a bit higher than the solution above.

Community
  • 1
  • 1
didierc
  • 14,572
  • 3
  • 32
  • 52
  • Thanks didierc. I changed my strategy and now use the "keep" approach rather than "erase" approach. Thank you especially for the great discussion and explanations! – Xu Wang Jan 04 '13 at 22:26