31

All over the web I see people use the erase/remove idiom for C++ vectors like so:

#include <vector> // the general-purpose vector container
#include <iostream>
#include <algorithm> // remove and remove_if
int main()
{
  // initialises a vector that holds the numbers from 0-9.
  std::vector<int> v = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

  // removes all elements with the value 5
  v.erase( std::remove( v.begin(), v.end(), 5 ), v.end() );

  return 0;
}

That is, if I want to erase all elements matching some criteria (e.g. the number 5 from a vector of ints), then I use std::remove or std::remove_if in conjunction with vector.erase like so:

vector.erase( std::remove( vector.begin(), vector.end(), <some_value>), vector.end());

This works nicely in general; std::remove (and remove_if) will copy (or use move semantics in C++11) the elements that are to be deleted over to the end of the vector, so the vector from our previous example will now look like this:

{ 0, 1, 2, 3, 4, 6, 7, 8, 9, 5 };

With the element 5 bolded because it's been moved to the end.

Now, std::remove will return an iterator to it, which we then use in erase to clear the elements out. Nice.

But what about the following example?

int main()
{
  // initialises an empty vector.
  std::vector<int> v = {};

  // removes all elements with the value 5
  v.erase( std::remove( v.begin(), v.end(), 5 ), v.end() );

  return 0;
}

This seems to work as expected (not erasing anything, not segfaulting, etc.) on all platforms I run it on, but I know that just because something is working, doesn't mean it's not undefined behavior.

The quick reference for vector.erase says this (emphasis mine):

iterator erase (const_iterator first, const_iterator last);

first, last are

Iterators specifying a range within the vector] to be removed: [first,last). i.e., the range includes all the elements between first and last, including the element pointed by first but not the one pointed by last. Member types iterator and const_iterator are random access iterator types that point to elements.

So is vector.erase(vector.end(),vector.end()) undefined behavior?

Here's what the quick reference says about exception safety:

If the removed elements include the last element in the container, no exceptions are thrown (no-throw guarantee). Otherwise, the container is guaranteed to end in a valid state (basic guarantee). An invalid position or range causes undefined behavior.

So, the answer, at least to me appears to be "YES", and this StackOverflow answer seems to support it.

Therefore, is the common idiom wrong?

Assuming it's undefined behavior, then any call to remove could return an iterator to vector.end() which should be checked before calling vector.erase, and calling remove on an empty vector does seem to return vector.end: (IDEOne for code below)

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

int main() {
   vector<int> myInts;
   auto anIter = std::remove(myInts.begin(),myInts.end(),5);
   if (anIter == myInts.end())
      std::cout << "iterator = myInts.end()";
}

Finally, my question:

Should the actual remove/erase idiom be this?

auto endOfRangeIterator = std::remove(vector.begin(), vector.end(), <value>);
if (endOfRangeIterator != vector.end())
   vector.erase(endOfRangeIterator, vector.end())
Community
  • 1
  • 1
AndyG
  • 39,700
  • 8
  • 109
  • 143
  • 2
    No, an empty range is fine. `[first, first)` is an empty set. – BoBTFish May 20 '14 at 13:36
  • *std::remove (and remove_if) will copy (or use move semantics in C++11) the elements that are to be deleted over to the end of the vector* Actually, they move elements that are **not** to be deleted to the **front** of the vector. The contents of the vector past the iterator returned are indeterminate. – Igor Tandetnik May 20 '14 at 13:39
  • @BoBTFish I agree that it should be fine, but according to the standard is it? The quick ref mentions that `vector.erase` will always erase the first element given. A naïve implementation of it could perhaps cause undefined behavior. – AndyG May 20 '14 at 13:42
  • @AndyG: http://en.cppreference.com/w/cpp/container/vector/erase (generally better than cplusplus.com) has the same definition as the standard (removes stuff from the range [f,l) ) – Mat May 20 '14 at 13:45
  • In your "Finally, my question" section, you are using the single argument version of erase. For that case, you must be sure that the iterator is dereferenceable. But the remove/erase idiom uses the range erase version, which requires 2 iterators, neither of which need to be dereferenceable if they denote an empty range. – Benjamin Lindley May 20 '14 at 13:50
  • the quotation is correct and contains the answer: "including the element pointed by first but not the one pointed by last" – 4pie0 May 20 '14 at 14:08
  • 1
    @privatedatapublicchannel2: Actually, the quotation is not correct. In the case where `first==last`, the quote contradicts itself, specifying that an element is removed, but not removed. Which explains why AndyG came to believe it was undefined behavior. – Benjamin Lindley May 20 '14 at 14:13
  • @BenjaminLindley why? first points to end which is not an element of vector thus is not included. This is same as what C++ Standard says in24.2.1/7: 'range [i,j) refers to the elements in the data structure starting with the element pointed to by i and up to but not including the element pointed to by j" – 4pie0 May 20 '14 at 14:15
  • 1
    @privatedatapublicchannel2: It's not the same. The standard doesn't actually say that the element pointed to by `i` is included in the range. It just says that's where the range starts. It does, however, definitely say that the element pointed to by `j` is *not* included. Also, the second iterator passed to `erase` does not necessarily have to be the end iterator, so if `first==last`, that does not necessarily imply that first points to end. – Benjamin Lindley May 20 '14 at 14:24
  • 1
    @privatedatapublicchannel2: To be clear, read the quote from cplusplus.com, replacing "last" with "first". *"including the element pointed to by first, but not the one pointed to by first"*. The statement is clearly a contradiction, in that case. – Benjamin Lindley May 20 '14 at 14:29
  • @BenjaminLindley cplusplus says that (end,end) is a range that includes all the elements between end and end, including the element pointed by end but not the one pointed by end, which must be of type range [end,end) which is correct – 4pie0 May 20 '14 at 14:31
  • @privatedatapublicchannel2: A statement which contradicts itself cannot be correct. How can an element be included but not included? – Benjamin Lindley May 20 '14 at 14:36
  • this is element **pointed by**, since end doesn't point to element there is no elements in range – 4pie0 May 20 '14 at 14:46
  • @privatedatapublicchannel2: You're assuming that the iterators are end iterators. That need not be the case. -- http://ideone.com/rcYPev -- In that code example, is the first element erased? If I go by the cplusplus.com documentation, it tells me "it is but it isn't", which is a contradiction. – Benjamin Lindley May 20 '14 at 14:51
  • @BenjaminLindley yes, in this case this is not correct, you are right – 4pie0 May 20 '14 at 14:57
  • 1
    Eventually we will [erase_if](http://stackoverflow.com/a/33523825/1708801) which will simplify this whole thing. – Shafik Yaghmour Nov 10 '15 at 03:18

3 Answers3

28

24.2.1/7 Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges. A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element pointed to by i and up to but not including the element pointed to by j.

Emphasis mine.

Further, the description of erase you cite is not the normative text in the standard. The standard has this to say (Table 100):

a.erase(q1,q2)

Effects: Erases the elements in the range [q1, q2).

This doesn't require that q1 be dereferenceable. If [q1, q2) is an empty range (per 24.2.1/7), then no elements are in the range, and so none are erased.

Igor Tandetnik
  • 50,461
  • 4
  • 56
  • 85
  • Thank you for the clarification. The definition of empty range according to the standard does wonders to clear things up; otherwise I think it would be ambiguous. – AndyG May 20 '14 at 14:14
6

I think the more important in your cite is:

Iterators specifying a range within the vector] to be removed: [first,last). i.e., the range includes all the elements between first and last, including the element pointed by first but not the one pointed by last. Member types iterator and const_iterator are random access iterator types that point to elements.

As we have found in comments, this citation from cpluspluc.com is incorrect. This will not violate the rules in case of ( v.end, v.end) but will be incorrect in case of

#include <vector>

int main()
{
    std::vector<int> v = { 1, 2, 3 };

    v.erase( v.begin(), v.begin());
}

because statement that contradicts itself with

the range includes (...), including the element pointed by v.begin() but not the one pointed by v.begin().

cannot be a valid statement.

C++ Standard n3337 in § 23.2.2 Sequence containers requirements Table 100 specifies that

a.erase(q1,q2) returns iterator . And note is:

Requires: For vector and deque, T shall be MoveAssignable. Effects: Erases the elements in the range [q1, q2).

And this is what it says about the range [i,j) in § 24.2.1/7 Iterator requirements

Most of the library’s algorithmic templates that operate on data structures have interfaces that use ranges. A range is a pair of iterators that designate the beginning and end of the computation. A range [i,i) is an empty range; in general, a range [i,j) refers to the elements in the data structure starting with the element pointed to by i and up to but not including the element pointed to by j. Range [i,j) is valid if and only if j is reachable from i. The result of the application of functions in the library to invalid ranges is undefined.

Thus to answer your questions

But what about the following example?

cplusplus.com is wrong in this case

So is vector.erase(vector.end(),vector.end()) undefined behavior?

No, no undefined behavior is triggered.

Therefore, is the common idiom wrong?

No, it is correct.

Should the actual remove/erase idiom be this?

There is no need for this, though it is also fine.

4pie0
  • 29,204
  • 9
  • 82
  • 118
4

So is vector.erase(vector.end(),vector.end()) undefined behavior?

No. Because of the statement right next to the one you emphesized:

Iterators specifying a range within the vector] to be removed: [first,last). i.e., the range includes all the elements between first and last, including the element pointed by first but not the one pointed by last.

So, vector.erase(vector.end(),vector.end()) does not attempt to erase vector.end() because it is pointed to by parameter last.

Granted, this definition is ambiguous and those statements can be interpreted as contradictory. The quoted wording is not used by the standard.

eerorika
  • 232,697
  • 12
  • 197
  • 326