11

EDIT: I have had a lot of answers telling me that I should separate the deletion into another loop. Perhaps I did not make it clear enough, but I stated in my last paragraph that I'd like to find a solution OTHER than that. ie keeping the current code structure, but using some little-known C++fu to make it work.

Well, I know that calling erase() on a vector invalidates iterators to the element and all those after it, and that erase() returns an iterator to the next valid iterator, but what if the erase happens elsewhere?

I have the following situation (simplified):

WARNING: Do NOT assume that this is the entire code. What is shown below is EXTREMELY simplified to illustrate my problem. All the classes and methods shown below are actually far more complex.

class Child {
   Parent *parent;
}

class Parent {
   vector<Child*> child;
}

void Parent::erase(Child* a) {
   // find an iterator, it, that points to Child* a
   child.erase(it);
}

int Child::update() {
   if(x()) parent.erase(*this) // Sometimes it will; sometimes (most) it won't
   return y;
}

void Parent::update() {
   int i = 0;
   for(vector<A>::iterator it = child.begin(); it != child.end(); it++)
      i += (*it)->update();
}

So, obviously, it will crash after it runs (*it)->update() if x() returns true, because when it does, the Child will tell the Parent to remove it from the vector, invalidating the iterator.

Is there any way of fixing this other than making Parent::erase() pass an iterator all the way back up to Parent::update()? This would be problematic, as it is not called for every call to Child::update(), and thus that function would need a way to return an iterator to itself every single other time, and it is also currently returning another value. I would also prefer to avoid some other similar way to separate the erasing the process from the updating loop.

Infiltrator
  • 1,611
  • 1
  • 16
  • 25
  • 2
    could you change the container to `std::list`? If so the iteration increment operation I had in my deleted answer would work... Of course there may be other side effects which are not apparent from your posted code... – Nim May 23 '11 at 11:39
  • Are you sure you should be using `std::vector`? It has fast random access, but slow insert and delete, so if you are going to do the later a lot and usually iterate over everything anyway, `std::list` will be better. – Jan Hudec May 23 '11 at 11:42
  • @Nim and @Jan Hudec: Actually, the erasing happens (relatively) rarely, and the most common operators on it are iteration over the lot, and random access, which is why I chose vector; unless there is a superior alternative? EDIT: Oh, and it doesn't matter where new elements get added; what the vector says is "this is all the stuff I own". – Infiltrator May 23 '11 at 11:51
  • FWIW, Qt has a "deleteLater()" on their objects, showing that a common way to address this issue is to defer the deletion out of the update. – Macke May 23 '11 at 12:20
  • @Macke: Well, I was hoping for a "cool" alternative, but I guess I'll have to go with either the deleteme or second loop method. – Infiltrator May 23 '11 at 12:31
  • 1
    STL Iterator invalidation rules: https://stackoverflow.com/questions/6438086/iterator-invalidation-rules – ideawu Jan 20 '18 at 02:50

8 Answers8

5

I recommend that you restructure your code to not mix the two different actions of updating (by deleting certain elements) and aggregating (by adding up the values) the data.

You could do this by changing the return-value of Child::update to something like std::pair<int, bool>, where the int is the value and the bool indicates if this element should be deleted.

If you can make Child::update a const method (meaning it does not modify the object, and only calls other const methods), you could write a simple functor that you can use with std::remove_if. Something like this:

class update_delete {
public:
    update_delete() : sum(0) {}
    bool operator()(const Child & child) {
        std::pair<int, bool> result = child.update();
        sum += result.first;
        return result.second;
    }
private:
    int sum;
}

If you cannot make update it const, just swap the element with some element from the back (you would have to keep an iterator that always points to the last element that is available for swapping). When you aggregation is done, just discard the end of the vector (that now contains all the elements that are to be deleted) using vector::resize. This is analogous to using std::remove_if, but I am not sure if it is possible/valid to use it with a predicate that modifies the objects in the sequence.

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • I see what you mean by separating the aggregation and deletion. However, I'd rather not do it that way, if there is a way of doing the way I want. Regarding the second part of your answer, though, the update method, by its definition, can't be a const; it needs to UPDATE things, after all. :P – Infiltrator May 23 '11 at 11:27
  • @Tim: I suspected as much, but it was not clear from your example. – Björn Pollex May 23 '11 at 11:28
5

You can't really iterate over and mutate a std::vector at the same time unless there's some communication between the iteration that the mutation.

I've seen other, non-standard, containers facilatate this through "smart" iterators that know when their value has been erased (and maybe auto-jump to the next item). It's quite a bit more book-keeping though.

Macke
  • 24,812
  • 7
  • 82
  • 118
  • Now that's what I like to hear. Which containers would these be? – Infiltrator May 23 '11 at 12:16
  • @Tim: Those aren't available in the STL (forgot to emphasize that, now fixed). I've seen them for Java, and you could certainly write them yourself in C++ as a fun exercise. OTOH, there will be more overhead for creating/incrementing/destroying iterators, so walking the list twice _will be_ the better option. – Macke May 23 '11 at 12:19
  • That is NOT what I like to hear. :P Well, I was hoping for a "cool" alternative, but I guess I'll have to go with either the deleteme or second loop method. – Infiltrator May 23 '11 at 12:29
  • @Tim: Yeah. Sorry about that. :-p I'd guess those "tracking" iterators are just to bloated to match the taste of the C++ community. Also, pulling the floor out from beneath an iterator _is_ a tricky operation, since it invalidates too much. Note that using a lot of shared_ptr's might help, but those haven't been in the standard for too long. – Macke May 23 '11 at 14:00
  • Yeah, I'm not really too familiar with shared_ptr, but I think it might be better to switch to those in this case. Then again, without knowing them too well, I'll end up using them incorrectly. On the other foot, I won't learn how to use them unless I do. :P – Infiltrator May 23 '11 at 14:35
4

If you can communicate both erase-intent and id out of your update-function, then you can do it like this:

std::tuple<int, bool> Child::update() {
   auto erase = x();
   return {y, erase};
}

void Parent::update() {
   int i = 0;

   for(vector<A>::iterator it = child.begin(); it != child.end();) {
      auto [y, erase] += (*it)->update();
      i += y;

      if (erase) {
         it = child.erase(it); // erase returns next iterator
      } else {
         ++it;
      }
   }
}
Macke
  • 24,812
  • 7
  • 82
  • 118
  • As I said in my question, I'd rather avoid doing this, and was asking if there is any other way of doing it. Thanks, anyway. – Infiltrator May 23 '11 at 12:08
  • @Tim: I thought you wanted to avoid sending the iterator out, or walking the list twice. – Macke May 23 '11 at 12:11
  • My apologies if my question is misleading. The problem is that Parent::erase() actually needs to do more than simply remove the element from the vector. I suppose I could keep the rest of this book-keeping in a separate function and call it before I call child.erase(). I will keep that in mind; thank you. However, I like the sound of your other solution at this point. – Infiltrator May 23 '11 at 12:24
2

One basis for a solution (as others have said) is to switch from std::vector to std::list, since you can then erase nodes from a list without invalidating references to other nodes.

But vectors tend to have a lot better performance than lists, due to much better locality of reference and also add a lot of memory overhead (in the per node prev and next pointers, but also in the form of overhead per allocated block in your system allocator).

What I did, then, with some similar requirements, is to stick with a vector, but to allow holes or 'dead entries' in your vector when elements are deleted.

You'll need to be able to detect and skip dead entries in your vector during in-order iteration, but you can collapse the vector to remove these dead entries periodically (or just after the specific iteration loop that deletes elements has completed). You can also (if necessary) include a freelist for reusing dead entries for new children.

I describe this setup in more detail in the following blog post:: http://upcoder.com/12/vector-hosted-lists/

A couple of other setups that I talk about in that blog post that could be relevant to these kinds of requirements are a 'vector hosted' linked list and a linked list with custom memory pool.

Thomas Young
  • 193
  • 3
  • 9
2

How about adding the children to delete into a list, and delete them after you've updated every child.

In effect you'll defer deletion until after the first loop.

Skurmedel
  • 21,515
  • 5
  • 53
  • 66
  • This will just shift the same problem to the second pass. When you erase the first element in the list, all the remaining iterators in the list become invalid. – Jan Hudec May 23 '11 at 11:25
  • The same as what Space_C0wb0y said; good answer nevertheless. However, I'd rather not do it that way, if there is a way of doing the way I want. – Infiltrator May 23 '11 at 11:28
  • Correct algorithm: add children to delete _in order_, erase _in reverse order_. (or sort them). By erasing in reverse order, you don't invalidate iterators that are yet to be erased. – MSalters May 23 '11 at 11:35
  • @Jan Hudec: no... You could have a second list, and iterate it; or do it like MSalters suggest. – Skurmedel May 23 '11 at 11:51
  • 1
    @Jan Hudec, @MSalters, and @Skurmedel: Yeah, that's how I understood it; build a separate list, perhaps another vector that holds the same pointers, and then call Parent::erase() on each of those pointers. That way, no elements are actually erased from the second list, and its iterators will stay valid. The second list can then safely be disposed of, as it was only holding pointers. Having said that, I'd rather avoid this method, if possible. – Infiltrator May 23 '11 at 11:56
2

I'm unsure of all you're design constraints / goals, but you could expose the need to delete a child through it's public API and just make Parent::update conditionally do the delete.

void Parent::update()
{
    int i = 0;
    for( vector<A>::iterator it = child.begin();
         it != child.end(); /* ++it done conditionally below */ )
    {
        i += (*it)->update();
        if( (*it)->should_erase() )
        {
            it = child.erase( it );
        }
        else
        {
            ++it;
        }
    }
}

bool Child::should_erase() const
{
    return x();
}

int Child::update()
{
    // Possibly do other work here.
    return y;
}

Then perhaps you can remove Parent::erase.

Charles L Wilcox
  • 1,126
  • 8
  • 18
  • Similar to what Macke said, yes. However, the problem is that Parent::erase() actually needs to do more than simply remove the element from the vector. I suppose I could keep the rest of this book-keeping in a separate function and call it before I call child.erase(). I will keep that in mind; thank you. – Infiltrator May 23 '11 at 12:25
0

try the modified version below:

   for(vector<A>::iterator it = child.begin(); it != child.end();)
      i += (*it++)->update();

EDIT: This is of course horribly broken, however it would work if you could change the container to std::list. Without that change, you could try a reverse iterator (if order did not matter), i.e.

   for(vector<A>::reverse_iterator it = child.rbegin(); it != child.rend();)
      i += (*it++)->update();
Nim
  • 33,299
  • 2
  • 62
  • 101
  • 1
    I think the point is that the parent iterator is invalid sometimes after `child->update` runs. Moving the increment operator won't change that. – sje397 May 23 '11 at 11:06
  • @sje Yes; that is correct. The iterator is sometimes invalidated by the call to (*it)->update(); not most of the time, but it still occurs. I'll edit my question if it is unclear? – Infiltrator May 23 '11 at 11:09
  • @sje397, here is my simple example: http://www.ideone.com/zYLca, I think it's emulating what the OP is doing as I understand his code... – Nim May 23 '11 at 11:11
  • @sje397: Added an explanation of why the crash occurs, so people don't get thrown off. – Infiltrator May 23 '11 at 11:11
  • @Nim: doesn't this line: "foo.erase(it++);" crash it? From what I understand, the call to erase() will invalidate the iterator, it, which you then attempt to increment; but it is already invalid, so attempting to increment it will only result in a crash; or worse: a bug. – Infiltrator May 23 '11 at 11:14
  • @Nim: The point is that `erase` invalidates the iterator. You can't use it after that, like you are doing. "[erase] invalidates all iterator and references to elements after position or first." from http://www.cplusplus.com/reference/stl/vector/erase/ – sje397 May 23 '11 at 11:15
  • @Tim, it's calling post-increment, which means that `erase()` is actually called with the previous value of iterator `it` even though `it` has just been incremented. The thing I'm not sure about is whether this is invoking UB, and if so, why isn't ideone seeing a crash... – Nim May 23 '11 at 11:18
  • @Nim: vector::erase invalidates *all* iterators (unlike list, where it only invalidates the erased one). The only chance is using the iterator returned from erase. – Jan Hudec May 23 '11 at 11:23
  • @Nim: Yeah; that's what I meant. It's in effect the same as erase(it); it += 1;, is it not? Which should mean that it is still invalid. However, the code snippet you posted works... even though it shouldn't. I'm really confused here. What's UB? EDIT: Ah, right, undefined behaviour. Yeah, figures. But it doesn't work for when I'm using classes and pointers. – Infiltrator May 23 '11 at 11:48
  • @Tim, did you try the reverse iteration? – Nim May 23 '11 at 12:25
  • @Nim: Sorry, reverse iteration is just as broken. The vector occasionally decides to free memory at which point *all* iterators are invalidated and accessing them causes segmentation fault. – Jan Hudec May 23 '11 at 13:45
  • @Tim: It *sometimes* works. The vector operations invalidate the iterators to the point they cause crashes only if they decide to allocate or free memory, which is only when you get to double/half size of the initial vector. In the other cases, erase shifts the iterators past the erased element by one. – Jan Hudec May 23 '11 at 13:48
  • @Jan, really? I thought vectors only ever grow and will only shrink if you swap them with something else... – Nim May 23 '11 at 13:49
  • I think that reverse iteration should works (assuming you never add items during update). Maybe an intermediate iterator variable is required: `itDel = it; ++it; itDel->update();` – Phil1970 Aug 06 '19 at 20:46
0

I think the problem is, that your method "update" also invalidates iterators in the exact same way that std::vector::erase does. So I recommend you just do the exact same thing: return the new, valid iterator and apdapt the calling loop accordingly.

b.buchhold
  • 3,837
  • 2
  • 24
  • 33
  • See the last two paragraphs of my question. – Infiltrator May 23 '11 at 11:57
  • @Tim sorry i somehow missed that. I think the problem is that "Child" has a dependency to "Parent". The method "Child::update" as somewhat strange semantics as it actually updates the parent and not the Child. You could replace it with something Like "bool shouldBeErased()" and deal with erasing from the vector in the class that actually has that member, "Parent". – b.buchhold May 23 '11 at 12:05
  • Sorry; the example is extremely simplified to illustrate my problem, and Child::update actually does a _lot_ to the Child. I'll edit my question to make this clearer. And as I said in my last paragraph, I want to know if there is any way of doing it other than that. Is my question really that confusing? :\ – Infiltrator May 23 '11 at 12:10
  • @Tim: still, updated() could do the updates and signal if it needs to be deleted to the partent. for exmaple via return value, setting a member, and so on. I still think that deleting from the vector should not be responsibility of "Child" – b.buchhold May 23 '11 at 12:20
  • Think of it as the Child declaring independence from the Parent, and thus no longer being "owned" by the Parent. At least, that's what it does, from a design point of view. From the looks of it, though, I suck at coding and will have to rewrite. – Infiltrator May 23 '11 at 12:28