Select specific elements from a vector

Question

I have a vector v1, and a boolean vector v2 of the same size. I want to delete from v1 all values such that the parallel element of v2 is false:

vector<int> v3; // assume v1 is vector<int>
for (size_t i=0; i<v1.size(); i++)
    if (v2[i])
        v3.push_back(v1[i]);
v1=v3;

Is there a better way to do it?

in C++03
in C++11

@user2079303 ... and then assigns the copy back to `v1`. It's a form of erase/remove idiom. — Igor Tandetnik, Jul 14 '16 at 13:14
@IgorTandetnik Oh, right. Makes sense now, that I see the assignment :) — eerorika, Jul 14 '16 at 13:16
Are you 100% sure you want a new *vector* and not a range (i.e., something that has a begin() and an end())? — lorro, Jul 14 '16 at 13:31
Surprised nobody's mentioned zip iterators yet. http://stackoverflow.com/a/12553437/560648? — Lightness Races in Orbit, Jul 14 '16 at 13:45
Surprised nobody questioned the existence of `v2`. It feels like purely transient information that is unnecessarily captured in a variable. Why is that? At the time where you have your yes/no decision taken, just remove that element from `v1` right then and there, don't capture it in a boolean only to use that boolean later. That is, call `remove_if` with whatever predicate was used to generate the information in `v2`. — screwnut, Jul 14 '16 at 17:57
@screwnut - "At the time where you have your yes/no decision taken, just remove that element from v1 right then and there" - Remove it with erase()? Serious? — user31264, Jul 14 '16 at 18:08
@screwnut - `vector::erase()` takes linear time. Removing each offending elements with erase() means quadratic time complexity. `vector::erase()` also invalidates all pointers, references, iterators to the subsequent elements. This function is slow, unsafe, and should generally be avoided. (I hope you are not going to say "then use lists".) On top of this, we may need the offending element to determine the validity of other elements. — user31264, Jul 14 '16 at 19:18
@user31264 But all the answers make use of `erase` including the one you accepted. Also, consider the true complexity of `erase`, it's not just plainly ["linear"](http://en.cppreference.com/w/cpp/container/vector/erase). I feel like I'm missing some crucial information here... _On top of this, we may need the offending element to determine the validity of other elements._ Well, there it is... — screwnut, Jul 14 '16 at 20:41
" But all the answers make use of erase including the one you accepted." - wrong. Not all. More important, the answer which I accepted, as well as most others, uses erase ONCE, don't you see it yourself? "Also, consider the true complexity of erase, it's not just plainly "linear"." - wrong. " I feel like I'm missing some crucial information here..." - wrong. "On top of this, we may need the offending element to determine the validity of other elements. Well, there it is..." - I needed the algorithm so many times in various circumstances that I now want it for general case. — user31264, Jul 14 '16 at 22:51
PS: "But all the answers make use of erase including the one you accepted." - not only the answer I accepted, as well as most other answers, use `erase` only once, they also use it for final part of the array. This special case of `vector::erase` is fast and safe. — user31264, Jul 14 '16 at 22:59

score 20 · Accepted Answer · answered Jul 14 '16 at 13:09

20

size_t last = 0;
for (size_t i = 0; i < v1.size(); i++) {
  if (v2[i]) {
    v1[last++] = v1[i];
  }
}
v1.erase(v1.begin() + last, v1.end());

Same as yours essentially, except it works in-place, not requiring additional storage. This is basically a reimplementation of std::remove_if (which would be difficult to use directly, because the function object it uses is given a value, not an index or iterator into the container).

answered Jul 14 '16 at 13:09

Igor Tandetnik

50,461
4
56
85

10

If `v1` actually contains something more complex than `int`, this could be optimised a bit more by doing: `v1[last++] = std::move(v1[i]);`. – Angew is no longer proud of SO Jul 14 '16 at 13:18
This one is definitly compatible with every version – Ceros Jul 14 '16 at 13:18
@Angew is s = move(s) required to work at least for STL data types? – RiaD Jul 14 '16 at 14:04
@RiaD All the STL containers that I know of have a move constructor. For custom classes that do not have a move constructor, use of `std::move` would cause the copy constructor be be called instead. So, you get a benefit of speed if a move constructor if available, and no loss of compatibility if there is no move constructor. – Eldritch Cheese Jul 14 '16 at 15:01
@EldritchCheese, the question is in move assignment (not ctoring) to the very same object. I do understand that the have move assignment operator, but the question is whether it's allowed to assgin to the very same object (consider case where v2[0] is true) – RiaD Jul 14 '16 at 15:48
as far as I remember implementation is allowed to suppose that rvalue argument is the only link to the object and write implementation accordingly (and for example for vector/string first implementation that comes to mind doesn't support s = move(s) case) – RiaD Jul 14 '16 at 15:50
1

@Angew Self-move-assignment is a no-no. – T.C. Jul 14 '16 at 23:30
@T.C. Right of course. So that "optimisation" idea of mine would break horribly :-( I'll keep the comment around for reference of the discussion, though. – Angew is no longer proud of SO Jul 15 '16 at 08:19

aruisdante · Answer 2 · 2016-07-14T13:41:17.043

19

In C++11 you can use std::remove_if and std::erase with a lambda, which is the "erase-remove-idiom":

size_t idx = 0;
v1.erase(std::remove_if(v1.begin(),
                          v1.end(),
                          [&idx, &v2](int val){return !v2[idx++];}),
           v1.end())

And here's a link to it functioning as intended: cpp.sh/57jpc

As the comments point out however, there's a bit of discussion about the safety of doing it this way; the underlying assumption here is that std::remove_if will apply the predicate to the elements of v1 in order. However, the language in the doc doesn't explicitly guarantee this. It simply states:

Removing is done by shifting (by means of move assignment) the elements in the range in such a way that the elements that are not to be removed appear in the beginning of the range. Relative order of the elements that remain is preserved and the physical size of the container is unchanged. Iterators pointing to an element between the new logical end and the physical end of the range are still dereferenceable, but the elements themselves have unspecified values (as per MoveAssignable post-condition). A call to remove is typically followed by a call to a container's erase method, which erases the unspecified values and reduces the physical size of the container to match its new logical size.

Now, it would be difficult with only a forward iterator to a std::vector to both guarantee stability for results and not apply the predicate in-order. But it's certainly possible to do so.

edited Jul 14 '16 at 13:41

answered Jul 14 '16 at 13:14

aruisdante

8,875
2
30
37

3

I wonder to what extent it's guaranteed that `idx` and `val` would stay in sync; that the function object will be called for each value in a proper sequence. – Igor Tandetnik Jul 14 '16 at 13:16
`std::remove_if` just iterates across the container, so `idx` should always point to the "original" location of an element in the container. AFAIK, it should always hit the elements in their original order. `val` in this case isn't actually used, it's just there as a stub to complete the function signature since the OP said the data in `v1` doesn't actually matter for determining the removal condition, just the data in `v2`. – aruisdante Jul 14 '16 at 13:17
@aruisdante The standard does *not* guarantee the order in which the predicate will be applied to the sequence. – Angew is no longer proud of SO Jul 14 '16 at 13:21
3

@ildjarn The requirements on stable algorithms (**[algorithm.stable]**) say that the relative order of *elements* should be preserved. I don't see where it states that the *predicate* should be called for each element in order. `for_each` is the only algorithm I know of that explicitly guarantees this; the fact that it *has* to spell this out suggests to me that, absent such language, the implementation is given the latitude to call predicate out of order. – Igor Tandetnik Jul 14 '16 at 13:23
@Angew You're correct in that the algorithm itself doesn't, but it does state that it is using a forward-iterator, which for `std::vector` will "do the right thing". It would not work for containers where the order of iteration doesn't match the index ordering. – aruisdante Jul 14 '16 at 13:23
@IgorTandetnik : This is all the C++11 standard has to say about it, from [alg.remove]: "*Effects: Eliminates all the elements referred to by iterator `i` in the range [`first`,`last`) for which the following corresponding conditions hold: `*i == value`, `pred(*i) != false`. Remarks: Stable.*" – ildjarn Jul 14 '16 at 13:23
1

@aruisdante Forward iterators are not input iterators; they are multi-pass. It would be perfectly possible to apply the predicate out of order, perhaps even in parallel. – Angew is no longer proud of SO Jul 14 '16 at 13:27
@IgorTandetnik Yeah, I definitely agree that it's ambiguous here. – aruisdante Jul 14 '16 at 13:30
@Angew for sure, you *could* do things like that. I mean, I doubt it does (the parallel bit) because then it would have to worry about thread-safety of accessing underling elements and the standard would have to warn you about it, but it certainly could. – aruisdante Jul 14 '16 at 13:32
What I'm more surprised at is that no one noticed I accidentally did `std::erase` instead of `v1.erase` :p – aruisdante Jul 14 '16 at 13:33
1

As `std::remove_if` operates doesn't it shift the elements down from the one moved to the end of the container? This would damage the corelation between the two vectors. – Galik Jul 14 '16 at 13:39
@Galik It absolutely modifies the container in-place, but that wouldn't change the number of times the predicate is called, which is what is incrementing the index. The only thing this implementation requires is that the predicate is applied across the iterator *in order*. Which, as the comments point out and I've edited my answer to more clearly disclose, there is some debate on if it is guaranteed to do so, or simply happens to do so for most stdlib implementations. – aruisdante Jul 14 '16 at 13:44
@IgorTandetnik C++17 actually introduces an `ExecutionPolicy` template parameter that would guarantee sequential operation vs. in-parallel operation: http://en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t – aruisdante Jul 14 '16 at 13:49
@aruisdante Yes I see what you mean. It is possible the standard uses a *shift* rather than *swaps* to ensure things like this work as expected. I mean I would have thought a *swap* would have been more efficient. – Galik Jul 14 '16 at 13:49
@Galik In C++11 the introduction of `std::move` makes a shift equal complexity to a `swap` in the worst case. – aruisdante Jul 14 '16 at 13:52
1

@aruisdante It's "sequenced", not "sequential" - very different things. "Sequenced" means "single-threaded" essentially - the opposite of "unsequenced" which means "possibly running in parallel on different threads". It says nothing about the order of calls, only that they don't run in parallel. – Igor Tandetnik Jul 14 '16 at 13:56

manlio · Answer 3 · 2021-05-08T08:41:08.433

9

A remove_if-based alternative is:

v1.erase(std::remove_if(v1.begin(), v1.end(),
                        [&v1, &v2](const int &x){ return !v2[&x - &v1[0]]; }),
         v1.end());

Also consider that if you only need a view on v1 in which some elements are skipped, you can avoid to modify v1 and use something like boost::filter_iterator.

edited May 08 '21 at 08:41

answered Jul 14 '16 at 13:23

manlio

18,345
14
76
126

Yakk - Adam Nevraumont · Answer 4 · 2016-07-14T19:58:54.440

I hear you like lambdas.

auto with_index_into = [](auto&v){
  return [&](auto&& f){
    return [&,f=decltype(f)(f)](auto& e){
      return f( std::addressof(e)-v.data(), e );
    };
  };
};

This may be useful. It takes a .data() suporting container, then returns a lambda of type ((Index,E&)->X)->(E&->X) -- the returned lambda converts an indexed element visitor to an element visitor. Sort of lambda judo.

template<class C, class Test>
auto erase_if( C& c, Test&& test) {
  using std::begin; using std::end;
  auto it=std::remove_if(begin(c),end(c),test);
  if (it==end(c)) return false;
  c.erase(it, end(c));
  return true;
}

because I hate the remove erase idiom in client code.

Now the code is pretty:

erase_if( v1, with_index_into(v1)(
  [](std::size_t i, auto&e){
    return !v2[i];
  }
));

The restriction on moves in remove/erase should mean it invokes the lambda on the element in its original position.

We can do this with more elemental steps. It gets complicated in the middle...

First, tiny named operator library:

namespace named_operator {
  template<class D>struct make_operator{};

  enum class lhs_token {
    star = '*',
    non_char_tokens_start = (unsigned char)-1,
    arrow_star,
  };

  template<class T, lhs_token, class O> struct half_apply { T&& lhs; };

  template<class Lhs, class Op>
  half_apply<Lhs, lhs_token::star, Op>
  operator*( Lhs&& lhs, make_operator<Op> ) {
    return {std::forward<Lhs>(lhs)};
  }
  template<class Lhs, class Op>
  half_apply<Lhs, lhs_token::arrow_star, Op>
  operator->*( Lhs&& lhs, make_operator<Op> ) {
    return {std::forward<Lhs>(lhs)};
  }

  template<class Lhs, class Op, class Rhs>
  auto operator*( half_apply<Lhs, lhs_token::star, Op>&& lhs, Rhs&& rhs )
  {
    return named_invoke( std::forward<Lhs>(lhs.lhs), Op{}, std::forward<Rhs>(rhs) );
  }

  template<class Lhs, class Op, class Rhs>
  auto operator*( half_apply<Lhs, lhs_token::arrow_star, Op>&& lhs, Rhs&& rhs )
  {
    return named_next( std::forward<Lhs>(lhs.lhs), Op{}, std::forward<Rhs>(rhs) );
  }
}

Now we define then:

namespace lambda_then {
  struct then_t:named_operator::make_operator<then_t> {} then;

  template<class Lhs, class Rhs>
  auto named_next( Lhs&& lhs, then_t, Rhs&& rhs ) {
    return
      [lhs=std::forward<Lhs>(lhs), rhs=std::forward<Rhs>(rhs)]
      (auto&&...args)->decltype(auto)
    {
      return rhs( lhs( decltype(args)(args)... ) );
    };
  }
}
using lambda_then::then;

which defines a token then such that lambda1 ->*then* lambda2 returns a function object that takes its arguments, passes it to lambda1, then passes the return value to lambda2.

Next we define to_index(container):

template<class C>
auto index_in( C& c ) {
  return [&](auto& e){
    return std::addressof(e)-c.data();
  };
}

we also keep the above erase_if.

This results in:

erase_if( v1,
  index_in(v1)
  ->*then*
  [&](auto i){
    return !v2[i];
  }
);

solving your problem (live example).

`f=decltype(f)(f)` Are you creating a member copy of `f`? Why that syntax? — SirGuy, Jul 14 '16 at 14:24
@guygreer perfect forwarding with an `auto&&x` argument is easiest done with `decltype(x)(x)`. And as the lambda may be an rvalue, using a mere reference to it is rude — Yakk - Adam Nevraumont, Jul 14 '16 at 14:36
nice solution. Totally unreadable of course, but nice use of c++jitsu :) +1 — Richard Hodges, Jul 14 '16 at 17:40
@rich I think I can make it better with more pieces. Like `erase_if(v1,element_to_index(v1)->*then*[&](auto i){return !v2[i];}));` — Yakk - Adam Nevraumont, Jul 14 '16 at 18:08
@RichardHodges Done. Better name for `element_to_index` as bonus. — Yakk - Adam Nevraumont, Jul 14 '16 at 19:59
@yakk I've just been inspired. imagine: `auto x = y *unless* (a == b) *implying* c;` — Richard Hodges, Jul 14 '16 at 20:12

Galik · Answer 5 · 2016-07-14T21:45:52.110

I actually quite like the way you did it but I would make a couple of changes in limiting the scope the temporary vector is used and I would use std::vector::swap to avoid a copy at he end. If you have C++11 you could use std::move instead of std::vector::swap:

#include <vector>
#include <iostream>

int main()
{
    std::vector<int> iv = {0, 1, 2, 3, 4, 5, 6};
    std::vector<bool> bv = {true, true, false, true, false, false, true};

    // start a new scope to limit
    // the lifespan of the temporary vector
    {
        std::vector<int> v;

        // reserve space for performance gains
        // if you don't mind an over-allocated return
        // v.reserve(iv); 

        for(std::size_t i = 0; i < iv.size(); ++i)
            if(bv[i])
                v.push_back(iv[i]);

        iv.swap(v); // faster than a copy
    }

    for(auto i: iv)
        std::cout << i << ' ';
    std::cout << '\n';
}

In C++11 you can just use `std::move` instead of `std::swap` to avoid the copy and make intent more explicit. — aruisdante, Jul 14 '16 at 13:30
While we're at optimizing: `v.reserve(iv.size())` would prevent multiple resizes at the cost of over-allocating the vector. — Martin Ba, Jul 14 '16 at 21:34

score 2 · Answer 6 · answered Jul 14 '16 at 13:33

Different version that erases elements in place but does not require as many moves as Igor's algo requires and in case of small amount of elements to be erased could be more efficient:

using std::swap;
size_t last = v1.size();
for (size_t i = 0; i < last;) {
   if( !v2[i] ) {
       --last;
       swap( v2[i], v2[last] );
       swap( v1[i], v1[last] );
   } else 
       ++i;
}
v1.erase(v1.begin() + last, v1.end());

but this algo is unstable.

score 1 · Answer 7 · answered Jul 14 '16 at 16:39

1

If you use a list (or forward_list for C++11) instead of a vector, you can do this in-place without the moving/allocating/copying overhead required for vector operations. It's perfectly possible to do most storage-related things with any STL container, but appropriate choice of containers will often give significant improvements in performance.

answered Jul 14 '16 at 16:39

Graham

1,655
9
19

The use of a `list` to remove elements at a minimum requires a move of the "next" pointer to remove a node and the deallocation of the deleted node for _each deletion_. I won't even bring up the performance impact that traipsing all over memory to follow links has on the cache... Measuring the move to a list before abandoning a vector. – David Thomas Jul 14 '16 at 17:37
@DavidThomas Of course, but it may be a lower impact than moving the entire contents of the vector. If you've only got a few elements then sure, stick with the vector. If you've got thousands or millions, you're probably better with a list in-place or with setting up a new vector, and you may be better with a deque so that adding new elements is cheap. If you've got thousands of millions, you probably always want to do it in-place because you don't want the RAM hit of holding a duplicate. – Graham Jul 15 '16 at 10:15

Select specific elements from a vector

7 Answers7