Erase some of a vector's elements in a for-each loop without iterating the whole vector

Question

I have a vector and I am searching for an element in it while iterating over the vector with a for-each loop. If I find any invalid elements during the search, I would like to remove them from the vector.

Basically, I want to do something like this:

for (auto el : vec) {
    if (el == whatImLookingFor) {
        return el;
    } else if (isInvalid(el)) {
        vec.erase(el);
    }
}

I looked at some other questions like this and this, but both recommend using std::remove_if. That will iterate over the whole vector and remove all invalid elements, instead of iterating only until I find the element I'm looking for, and ignoring any elements after that.

What would be a good way to do this?

Don't you need an iterator as an argument for [std::vector::erase()](http://en.cppreference.com/w/cpp/container/vector/erase) ??? — JFMR, Apr 28 '18 at 17:52
The [first question](https://stackoverflow.com/a/3938847) you linked to has a code snippet at the top which is basically what you want - just insert a check for whether the elements is the one you want, and if so `break`. — hnefatl, Apr 28 '18 at 17:53
Range-for is usually not the right tool if the underlying range is to be modified. — Baum mit Augen, Apr 28 '18 at 17:53
Also see [std::vector iterator invalidation](https://stackoverflow.com/q/3747691/608639). It is probably the duplicate because it deals with the invalid iterator caused by `erase`. — jww, Apr 28 '18 at 18:05

score 20 · Accepted Answer · edited Apr 29 '18 at 17:10

20

You should still use std::remove_if, just call std::find beforehand.

auto el = std::find(vec.begin(), vec.end(), whatImLookingFor);
auto p = std::remove_if(vec.begin(), el, isInvalid);

// returns the iterator, not the element itself.
// if the element is not found, el will be vec.end()
return vec.erase(p, el);

This will usually be more efficient than removing one element at a time.

edited Apr 29 '18 at 17:10

answered Apr 28 '18 at 18:09

Benjamin Lindley

101,917
9
204
274

Where did `v` come from in the first line? – wizzwizz4 Apr 29 '18 at 12:15

J. Merdich · Answer 2 · 2018-05-02T00:14:24.477

8

I was curious about the performance of this, so I ran a quick naive benchmark comparing Benjamin's find then partial clean and hnefatl's for-loop. Benjamin's is indeed faster: 113x faster. Impressive.

But I was curious where most of the computation was going, as it was greater than the sum of remove_if and find, which are the only functions that actually iterate through the array. As it turns out, however, the vec.erase line in his code is actually pretty slow. This is because in remove_if he is cleaning the area from the start to the location of the found value, which results in a gap in the middle of the array from the invalid values. vec.erase must then copy the remaining values to fill the gap and finally resize the array.

I ran a third test with a full-sized remove_if/vec.erase followed by a find, so the gap happens at the end and no copy is necessary, just truncation. It turned out to be about 15% faster and leaves the whole vector clean. Note that this assumes that testing for validity is cheap. If it's more than a few simple comparisons, Benjamin's answer will win hands-down, as pointed out by Chris in the comments.

The code

auto p = std::remove_if(vec.begin(), vec.end(), isInvalid);
vec.erase(p, vec.end());
return std::find(vec.begin(), vec.end(), whatImLookingFor);

The Benchmark

edited May 02 '18 at 00:14

answered Apr 29 '18 at 03:28

J. Merdich

338
1
5

Interesting online-benchmark tool (a bit like http://Godbolt.org/ but with timing instead of asm output), I wasn't aware of it. It looks like you actually benchmarked with `clang -O3` (good), but the benchmark links come up with optimizations disabled ([ridiculous and useless](https://stackoverflow.com/questions/32000917/c-loop-optimization-help-for-final-assignment/32001196#32001196), especially for C++ template functions that have the entire implementation as small functions calling other small functions). The ratio is actually *bigger* with -O3 than -O0, though. – Peter Cordes Apr 29 '18 at 05:58
1

@JMerdich Good answer. In a realistic situation is possible that checking if an element is invalid could be computationally more expensive which could outweigh the faster `erase`. – Chris Drew Apr 29 '18 at 09:17
Both of these algorithms need to move down all the remaining elements and then erase the empty space on the end. The difference is that the partial clean must move all the elements after the search key and the full clean only moves the valid elements. Suppose all the elements after the key were valid and both algorithms produced as much data? http://quick-bench.com/SZpfwb2x0uNnI3N2grysEvTms6Y – TrentP Apr 29 '18 at 16:40
@bipll I'm not sure where you're getting O(n^2), as my code makes 3 calls and they are all O(n) or better (`remove_if` reshuffles the items as it checks, but it's still linear). Chris makes a good point in that I assumed that `isInvalid` is cheap; Benjamin's code would be faster if it isn't. – J. Merdich Apr 30 '18 at 22:24
@bipll: Um, yes? I stated that several times in the answer. However, since any solutions must iterate through the vector to fill the gap, you might as well clean it if the cleaning is cheap. I'm just saying OP should consider it as well as what he is asking for. – J. Merdich May 02 '18 at 00:10

score 7 · Answer 3 · edited Apr 29 '18 at 06:11

7

This is an intuitive approach to the problem that keeps the looping structure - while it only performs a single pass, due to repeated erasing it's likely to be less efficient than a two-pass approach. For this more efficient approach, you should use Benjamin Lindley's answer.

Modifying the code in the answer to the first link you gave, it looks like something like this would fit your specification:

for (auto i = vec.begin(); i != vec.end();)
{
    if (*i == whatImLookingFor)
        return i;
    else if (isInvalid(*i))
        i = vec.erase(i);       // slow, don't use this version for real
    else
        ++i;
}

If we find the element you're searching for, we return an iterator to it.
If we find an invalid element along the way (but not after the desired element), we erase it.
We avoid iterator invalidation by keeping the iterator returned from erase.

You'll still need to handle the case where the element's not found, probably by returning vec.end().

edited Apr 29 '18 at 06:11

Peter Cordes

328,167
45
605
847

answered Apr 28 '18 at 17:58

hnefatl

5,860
2
27
49

2

Calling `std::vector::erase` multiple times within a loop is generally going to be quite inefficient. Each time it will cause all the elements after the erased item to be moved along by one element. You should prefer Benjamin Lindley's answer. – Chris Drew Apr 28 '18 at 18:52
3

@hnefatl: My answer also ignores elements after the one being searched for. – Benjamin Lindley Apr 28 '18 at 19:04
@BenjaminLindley Whoops, I missed the sneaky `el` in your second line. Neat answer, +1 from me! – hnefatl Apr 28 '18 at 19:05

Chris Drew · Answer 4 · 2018-04-29T16:40:32.423

As @BenjaminLindley and @JMerdich pointed out, for the problem stated, a two-pass approach is probably simpler and more efficient.

In a realistic situation, it is possible that there is some expensive calculation that needs to be done either to determine if an element is invalid or to determine if an element is the one we are looking for:

In which case a two-pass approach becomes less desirable because it causes us to do this expensive calculation twice.

But it is possible to do a single-pass approach without calling std::vector::erase multiple times within the loop. It is not too hard to write std::remove_if yourself, then we can get it to do both checks at the same time. We still just call std::vector::erase once at the end:

std::vector<T>::iterator 
findAndRemoveInvalid(std::vector<T>& vec, U whatImLookingFor) {

  // Find first invalid element - or element you are looking for
  auto first = vec.begin();
  for(;first != vec.end(); ++first) {
    auto result = someExpensiveCalculation(*first);
    if (result == whatImLookingFor)
      return first;  
    if (isInvalid(result))
      break;
  }
  if (first == vec.end())
    return first;

  // Find subsequent valid elements - or element you are looking for
  auto it = first + 1;
  for(;it != vec.end(); it++) {
    auto result = someExpensiveCalculation(*it);
    if (result == whatImLookingFor)
      break;
    if (!isInvalid(result)) {
      *first++ = std::move(*it);  // shift valid elements to the start
      continue;
    }
  }

  // erase defunct elements and return iterator to the element
  // you are looking for, or vec.end() if not found.
  return vec.erase(first, it);
}

Live demo.

It is clearly more complicated though, so measure performance first.

Ulrich Eckhardt · Answer 5 · 2018-04-28T17:58:44.727

1

If a simple loop, exited with break is too primitive for you, I'd suggest using std::find() to get an iterator to the searched element and then to use vector.erase()to delete the othere elements.

edited Apr 28 '18 at 17:58

answered Apr 28 '18 at 17:52

Ulrich Eckhardt

16,572
3
28
55

3

I think the point of OP's question is that they want to perform two operations in one pass - erasing invalid elements, then returning an element matching a predicate. This answer seems to be just about erasing the element matching the predicate. – hnefatl Apr 28 '18 at 17:55
Fixed that, the approach is basically the same, only the erased range is swapped. – Ulrich Eckhardt Apr 28 '18 at 18:01

Erase some of a vector's elements in a for-each loop without iterating the whole vector

5 Answers5

The code