Moving elements from one vector to another using erase-remove paradigm

Question

This question clearly articulates how to move contents from one std::vector to another. In short, there is a std::move call required to physically move the memory, while there is also a std::erase call necessary to resize the original vector to account for the removed elements.

Is there a problem doing this using the erase-remove paradigm as one uses to delete from a vector while iterating over it (like here)?

For example, something like this:

// Vector with values [0, 1, ..., 19]
std::vector<int> myOldVec;
for (int i = 0; i < 20; i++) { myOldVec.push_back(i); }

// New vector to move into
std::vector<int> myNewVec;

// Move from myOldVec to myNewVec if value is less than 5
myOldVec.erase(
    std::remove_if(
        myOldVec.begin(),
        myOldVec.end(),
        [&](const int x)
        {
            if (x < 5)
            {
                myNewVec.push_back(x);
                return true;
            }
            return false;
        }
    ),
    myOldVec.end()
);

The intended output would be

myOldVec --> [5, 6, ..., 19]
myNewVec --> [0, 1, 2, 3, 4]

When I run this code it works in my tester. However, when dealing with objects instead of ints I worry that I'm not actually moving anything, but rather just referencing; for example, when doing the above with std::vector<MyObj> instead (with the appropriate lambda test).

Is this really performing a move, or am I right to be concerned that I'm just sharing a reference?

You're actually creating copies here, which for `int` is fine. But if you wanted to be more general and efficient with objects, you'd want to do `push_back(std::move(x))`. — Kurt Stutsman, Mar 03 '17 at 22:00
`const TYPE x` would have to be a bit brighter as well. `TYPE & x`should do it otherwise you're moving a const copy. — user4581301, Mar 03 '17 at 22:03

Nir Friedman · Accepted Answer · 2017-03-03T22:43:58.400

4

I think that generally the point of these algorithms, is that you are doing what you want to achieve by applying functions. Once the functions have side-effects, it seems like if anything the net result is maybe misleading, and it might just be better to do a for loop.

That said, remember that C++ is not Java. A vector<Foo> has to store Foo's, it just can't store references. However, there is still a problem with your whole idea.

myNewVec.push_back(x);

This line from your code will push a copy of x into your new vector. Because it's a copy, you don't have to worry about sharing references. Now, for integers, copies and moves are the same. But for complex objects (like, say, a vector), a move can be way, way faster than a copy. And the only vector is anyway getting rid of x, so we definitely want to move. So ideally we would change that line to:

myNewVec.push_back(std::move(x));

However, moving from an object clearly mutates it, and requires it not be const. The requirements for remove_if however require that the function object passed is a predicate. This, in turn means that:

The function object pred shall not apply any non-constant function through the dereferenced iterator.

http://en.cppreference.com/w/cpp/concept/Predicate

In other words, your function must accept the result of dereferencing the iterator, but is not supposed to mutate it. Because it is not supposed to mutate it, you can never move from the original object, but have to make a copy of it. Therefore, I don't think there is any conformant, efficient implementation of this idea for non-trivial types.

Here is a reasonable implementation:

template <class T, class F>
void transfer_if_not(std::vector<T>& old, std::vector<T>& new, F pred)
{
    auto part = std::partition(old.begin(), old.end(), pred);
    std::move(part, old.end(), std::back_inserter(new));
    old.erase(part);
}

This at least should not copy any elements. It will basically separate out the elements to stay and to leave in place in the original vector. Then efficiently move out the ones that are leaving. And then simply resize the array. As pointed out in the comments, this probably involves extra operations compared to what's optimal, but my sense is that the optimal version is not trivial to code up, and may involve tradeoffs (like more state), so it may not be a pure win.

Note that my algorithm accepts a vector specifically instead of iterators, because for other containers (say a linked list), this implementation is quite far from optimal.

edited Mar 03 '17 at 22:43

answered Mar 03 '17 at 22:04

Nir Friedman

17,108
2
44
72

You could probably use `std::partition` to partly simulate what `std::remove_if` is doing. May not be as efficient as a custom for loop, though--particularly if the objects have expensive move costs. – Kurt Stutsman Mar 03 '17 at 22:19
@KurtStutsman Hmm, maybe I'm missing something, where exactly is `std::partition` throwing away efficiency though that could obviously be recovered by a for loop? Your only alternative to partitioning is to delete elements you come across in place. But for a vector that will requiring moving every single item afterwards anyhow... Basically I'm not clear how a for loop can beat `partition` (assuming it's very well implemented of course). – Nir Friedman Mar 03 '17 at 22:24
With a for-loop you could duplicate exactly what `remove_if` does: iterate through vector and one-by-one move the matching elements into the other vector as they are found. You also keep track of the "end" of the vector to move non-matched elements to. Using `partition`, you first have to swap elements around to arrange them so that's either a swap or multiple moves. Then you have to move the elements again into the new vector. So if moves are expensive using `partition` may be worse. – Kurt Stutsman Mar 03 '17 at 22:28
@KurtStutsman If you move the elements directly from old to new vector, you still have the moved from objects in the old vector. So your new vector is good to go, but your old vector has "gaps" (actually valid, moved-from objects you don't want) in it. There's no way to remove the gaps from the vector without moving a bunch of stuff around. Partitioning + erase is the fastest way to do this for a vector. So for a vector, it's just a matter of transfer + partition, or partition + transfer. I think perf is about the same. For a linked list obviously it's not optimal to use partition. – Nir Friedman Mar 03 '17 at 22:32
@KurtStutsman some of your optimization is chucked out the window anyway. Think of all the copying being done with the vector resizes. Er... Yah. What Nir said. – user4581301 Mar 03 '17 at 22:33
@NirFriedman Moving from old vector to new vector is 1 move. Moving following elements to the new position is done as you iterate through the list, so that's 1 move per element. With `partition` if it cannot do a swap (not sure if it is even allowed to), you have to do 3 moves to swap two elements. I did say it *may* not be as efficient. In almost all cases it won't likely make a difference. As @user4581301 pointed out, reallocations will also be a factor unless you use `reserve` which you definitely would be if you're worried about move costs. – Kurt Stutsman Mar 03 '17 at 22:36
@KurtStutsman Lemme bang out what I'm thinking here. Post it in a few minutes and hopefully I won't get torpedo bombed for a non-answer. – user4581301 Mar 03 '17 at 22:37
@KurtStutsman I think I understand what you are saying now. Though, if you move the elements directly from the original vector to the new one, you will have to have a piece of extra state (like a bitset), so that you can make a second pass and erase-remove. So you will save move operations on the type but potentially pick up more costs of other types. I posted the partition based implementation, because I think it is still quite fast, works for move only types, and is easy to understand. Thanks for your insights. – Nir Friedman Mar 03 '17 at 22:47
Out of morbid curiosity I decided to test this out in [ideone](http://ideone.com/9rRzAy). This is in almost all cases a severe case of premature optimization. In the results you can see each number and the number of moves it incurred. Worth noting is that for unmoved elements the `partition` vs for loop variants had the exact same counts. Only matched elements incurred 4 moves with `partition` vs 2 with the for loop. So only with very heavy move operations combined with many elements being moved would this ever be noticed. – Kurt Stutsman Mar 03 '17 at 23:53
@KurtStutsman Eerily similar to what I was working on before I got dragged away: http://ideone.com/qE3GJh I reversed the order of the list because in order partition had to do nothing. Partition also mangled the list order which may or may not be bad. – user4581301 Mar 04 '17 at 00:08
1

Also partition sacrifices stability, which may or may not be important. – T.C. Mar 04 '17 at 15:22

Moving elements from one vector to another using erase-remove paradigm

1 Answers1