3

Following one of the "deleting while iterating" patterns on a vector, I don't understand why this code works, or if it's making use of undefined behavior:

The Code:

#include <vector>
#include <iostream>

int main(int argc, char* argv[], char* envz[])
{
  std::vector<std::string> myVec;
  myVec.push_back("1");
  myVec.push_back("2");
  myVec.push_back("3");

  for (std::vector<std::string>::iterator i = myVec.begin();
       i != myVec.end();
       ++i)
  {
    if ("1" == *i)
    {
      std::cout << "Erasing " << *i << std::endl;
      i = myVec.erase(i);
      --i;
      continue;
    }
    std::cout << *i << std::endl;
  }

  return 0;
}

The Output:

>g++ -g main.cpp
>./a.out 
Erasing 1
2
3

Question:

Consider the first iteration of the for-loop:

  • i is myVec.begin(), which "points to" 1.
  • We enter the conditional block.
  • 1 is erased and i is set to one past the erased element, i.e. 2, which is now also pointed to by myVec.begin()
  • I decrement i, so now it points to...one prior to myVec.begin() ???

I'm confused by why this seems to work, as evidenced by the output, but something feels fishy about decrementing the iterator. This code is easy enough to rationalize if the conditional is if ("2" == *i), because the iterator decrement still places it at a valid entry in the vector. I.e. if we conditionally erased 2, i would be set to point to 3, but then manually decremented and thus point to 1, followed by the for-loop increment, setting it to point back to 3 again. Conditionally erasing the last element is likewise easy to follow.

What Else I Tried:

This observation made me hypothesize that decrementing prior to vector::begin() was idempotent, so I tried addition an additional decrement, like so:

#include <vector>
#include <iostream>

int main(int argc, char* argv[], char* envz[])
{
  std::vector<std::string> myVec;
  myVec.push_back("1");
  myVec.push_back("2");
  myVec.push_back("3");

  for (std::vector<std::string>::iterator i = myVec.begin();
       i != myVec.end();
       ++i)
  {
    if ("1" == *i)
    {
      std::cout << "Erasing " << *i << std::endl;
      i = myVec.erase(i);
      --i;
      --i;      /*** I thought this would be idempotent ***/
      continue;
    }
    std::cout << *i << std::endl;
  }

  return 0;
}

But this resulted in a segfault:

Erasing 1
Segmentation fault (core dumped)

Can someone explain why the first code bock works, and specifically why the single decrement after erasing the first element is valid?

StoneThrow
  • 5,314
  • 4
  • 44
  • 86
  • 4
    Something using undefined behavior doesn't guarantee that it will crash - it may sometimes appear to work correctly. – interjay Feb 14 '17 at 22:58
  • Related: http://stackoverflow.com/questions/18225651/is-stdvectorbegin-1-undefined and http://www.cplusplus.com/forum/general/84609/ – qxz Feb 14 '17 at 22:58
  • 3
    It doesn't really work, but one possible effect of undefined behavior is "seems to work". It is not *totally* surprising if `--i; ++i;` results in `i` being unchanged. – Bo Persson Feb 14 '17 at 23:01
  • 1
    Why would you think `--i` would be idempotent?! Anyway, you would save a lot of time if you just tried the code with an implementation that supports debugging iterators, [like this](http://coliru.stacked-crooked.com/a/d07c9bcb2d285a54), and see it abort with a helpful error message. – Jonathan Wakely Feb 14 '17 at 23:10
  • @JonathanWakely Just to answer your question, I wondered whether `--i` was idempotent specifically when it pointed to `vector::begin()`, or whether it pointed to some magical "pre-`begin()`" holding spot from where it could legitimately be incremented back to `begin()`. But that's neither here nor there: this code relies on undefined behavior it seems. – StoneThrow Feb 14 '17 at 23:19

3 Answers3

6

No, your code has undefined behaviour: if i == myVec.begin(), then i = myVec.erase(i); results in i again being (the new value of) myVec.begin(), and --i has undefined behaviour since it goes outside the valid range for the iterator.

If you don't want to use the erase-remove idiom (i.e. myVec.erase(std::remove(myVec.begin(), myVec.end(), "1"), myVec.end())), then the manual loop-while-mutating looks like this:

for (auto it = myVec.begin(); it != myVec.end(); /* no increment! */) {
  if (*it == "1") {
    it = myVec.erase(it);
  } else {
    ++it;
  }
}

Regardless, the crucial point both here and in your original code is that erase invalidates iterators, and thus the iterator must be re-assigned with a valid value after the erasing. We achieve this thanks to the return value of erase, which is precisely that new, valid iterator that we need.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • No, as stated in my answer, this is due to the `++i` done by the `for` loop. – Telokis Feb 14 '17 at 23:03
  • 1
    But there's nothing left where the erased value was, so there's nothing before `i`, and so `--i` is undefined. Because it goes outside the valid range. It sounds like you're trying to argue around undefined behaviour. You can't do that. It's undefined, stop it. – Jonathan Wakely Feb 14 '17 at 23:09
  • @Kerrek, I agree your code sample is the more robust way of handling this situation, but I was still curious *what was wrong* with my original code. I think your + others' answers indicates that the problem is due to the undefined behavior of decrementing past vector::begin(). – StoneThrow Feb 14 '17 at 23:09
  • @JonathanWakely - Not trying to "argue around undefined behavior." Trying to confirm that undefined behavior is what is going on here. That is my suspicion, so I am turning to the community for confirmation. – StoneThrow Feb 14 '17 at 23:11
  • 1
    Try turning on debugging checks instead: http://coliru.stacked-crooked.com/a/d07c9bcb2d285a54 (you'd have got the answer faster). – Jonathan Wakely Feb 14 '17 at 23:11
1

This might work in some compilers, but might fail in others (e.g. the compiler might actually check in runtime that you are not decrementing under begin() and throw exception in such case - I believe that at least one compiler does it but don't remember which one).

In this case the general pattern is to not increment in the for but inside the loop:

  for (std::vector<std::string>::iterator i = myVec.begin();
       i != myVec.end();
       /* no increment here */)
  {
    if ("1" == *i)
    {
      std::cout << "Erasing " << *i << std::endl;
      i = myVec.erase(i);
      continue;
    }
    std::cout << *i << std::endl;
    ++i;
  }

With vector the wrong iteration might actually work in more cases, but you'd have very bad time if you try that e.g. with std::map or std::set.

EmDroid
  • 5,918
  • 18
  • 18
-1

The key here is the continue right after decrementing. By calling it, ++i will be triggered by the loop iteration before dereferencing i.

Telokis
  • 3,399
  • 14
  • 36
  • 2
    It isn't safe, it's undefined behavior. On many platforms nothing will happen, but on others it will crash (for example, some standard libraries do bounds checking in debug mode). – interjay Feb 14 '17 at 23:00
  • Why is that ? I don't understand how it can be undefined for pointers – Telokis Feb 14 '17 at 23:05
  • exactly the same reason it's undefined for iterators – M.M Feb 14 '17 at 23:11
  • Alright, I don't really get it but I erased that part. But I think no one answered the original question : `Can someone explain why the first code bock works, and specifically why the single decrement after erasing the first element is valid?` He really is asking why, not only wondering if it's "ok" – Telokis Feb 14 '17 at 23:13
  • @Ninetainedo - you're correct about my wondering about the first vs. second code block. But I think the answer comes down to undefined behavior. – StoneThrow Feb 14 '17 at 23:15
  • Yes, I totally agree on that point. I just learned about that as well. The more I know. – Telokis Feb 14 '17 at 23:16