11
std::vector<int> a;
a.push_back(1);
a.push_back(a[0]);

I just learned that the code above can be very dangerous.

(If it's not obvious why, you're not alone... it wasn't obvious to me either.)

My questions:

  1. What is the "standard" way of dealing with it? Making a new variable and then assigning it immediately to something afterward seems a bit weird to me. Is there a better way of dealing with it?

  2. How do you train yourself to watch out for aliasing issues like this? What pattern(s) do you look for? I have no idea to recognize this situation; I only learned about aliasing when I learned about the restrict keyword in C, and only now do I understand what the issue really is.


Edit:

I'd love to accept an answer, but it doesn't seem like part (2) of the question has been answered. I'm wondering what strategies people use to locate aliasing mistakes in code they have written.

One strategy I've come up with so far is to avoid passing in the same value for in two parameters. (In this case, one parameter is implicit and one explicit.)

Are there any other easy things to notice and watch out for?

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • 2
    This is guaranteed to work in C++03, in g++ when the vector grows, the first element that is copied is the new one, so that if that throws an exception the original container is left untouched (12.1/10). – David Rodríguez - dribeas Jun 02 '11 at 08:38
  • @DavidRodríguez-dribeas How does that prevent invalidation of the _reference_ to `a[0]`? – sehe May 30 '12 at 09:55
  • @sehe: On the order of the implementation: any reference to `a[0]` will be invalidated during the operation, but it will be invalidated *after* the element is copied, which means that at the time of access to `a[0]` (as parameter to `push_back`) that object is still alive. Additionally the exception guarantees require that all elements are copied to the new location (including the newly inserted) before freeing the old buffer, as the container cannot guarantee that copy construction will not throw, and if it throws the vector must be left in the original state. – David Rodríguez - dribeas May 30 '12 at 13:58
  • @DavidRodríguez-dribeas: Doesn't the mere *existence* of an invalid reference/pointer imply UB in C++, regardless of if it's dereferencedd? – user541686 May 30 '12 at 14:05
  • @Mehrdad: Then every program would be invalid, since after `delete` the pointer is invalid but it still exists. The UB is not that on the existence of the invalid reference/pointer, but rather in accessing it. – David Rodríguez - dribeas May 30 '12 at 14:22
  • @DavidRodríguez-dribeas: That's a great point, but what about questions [like this one](http://stackoverflow.com/a/3839023/541686)? (Perhaps after you use `delete`, you may not access the pointer anyway, so theoretically it could be zero'd by `delete` and hence made safe?) I've seen this question pop up more than once... and when I expressed my surprise someone told me I'm indeed wrong, and that the only valid pointers are those that point to an array or 1 element past. – user541686 May 30 '12 at 14:30
  • @DavidRodríguez-dribeas: [Another example](http://stackoverflow.com/questions/8804612/#comment10986186_8804896)... ah, [here](http://stackoverflow.com/q/10473573/541686) is the one I was referring to. – user541686 May 30 '12 at 14:47
  • @Mehrdad: I'll try to read those when I get more time, but in the mean time read 3.8/6 of the standard. – David Rodríguez - dribeas May 30 '12 at 15:01

5 Answers5

5

EDIT: Technically, the standard does not mandate that this is correct if the contained type has a no-throw copy constructor. I don't know any implementation where this does not hold anyway, as it would require producing two implementations of push_back when the generic one is just as efficient in all cases.

Aliasing is a problem in general, but not in this particular case. The code:

assert( v.size() > 0 );
v.push_back( v[0] );

Is guaranteed to be correct by the standard (C++03) through the exception guarantees (which are a really good reason not to implement your own containers, you will probably not get them right). In particular §23.1 [lib.container.requirements] / 10 dictattes:

Unless otherwise specified (see 23.2.1.3 and 23.2.4.3) [NOTE: both those references refer to insert on deque and vector respectively] all container types defined in this clause meet the following additional requirements:

— if an exception is thrown by a push_back() or push_front() function, that function has no effects.

Where the important bit is that if any exception is thrown in the operation, the container is left untouched, and that means that no iterator gets invalidated, which in turns means that the original region of memory is left untouched until it is guaranteed that no exceptions will be thrown (with the exception pun intended, of destructors). Because in general copy constructors can throw, the implementation must ensure that all copies are performed before destroying any object.

This becomes more evident in C++0x, when objects are not copied from one location to another, but rather moved. Because the copy of the new element might throw, it has to be performed before any of the moves are executed, or else you would be left in a situation where some of the objects in the original container have been invalidated.

Community
  • 1
  • 1
David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • @Johannes raises a good point in his comments, that the library is allowed to determine (via type traits), that `int` has a non-throwing constructor, so the exception guarantee is trivially met and places no restrictions on the implementation of `push_back`. – Ben Voigt Jun 02 '11 at 13:44
  • @Ben Voigt: That is a good and valid point, but it would mean that the implementation would have to provide two different implementations just to ensure that it can mess up your code if the type does not throw... so from a practical standpoint, I don't see any implementor doing it :) A similar slightly worse situation occurs in C++0x with types that implement move semantics, as moving the elements is a destructive operation: if a move constructor throws, you are basically screwed with a container where some of the elements have been moved and others haven't, kind of like with destructors. – David Rodríguez - dribeas Jun 02 '11 at 16:46
  • 1
    @DavidRodríguez-dribeas: Either you look at what the standard says or you look at what you think is reasonable. The standard requires that in case of an exception nothing happens, this however doesn't imply that destruction must be done after copy-construction of the passed object. Once it's known that copy constructing the element won't throw (e.g. because is an int) old storage destruction can be anticipated. Either you think Johannes is wrong or if he's right then a compliant implementation **may** have undefined behavior while doing `v.push_back(v[0]);` and that's not safe C++ code. – 6502 Oct 02 '11 at 21:53
  • @AlfP.Steinbach: Do you care to explain what is incorrect? There is a corner case I did not considered (pointed out by 6502) by which the implementer could potentially reorder the code *after* verifying that copy construction will not throw, and I have to agree that technically that is the case --whether the implementer would want to invest in providing a different solution for non-throw constructors, adding code unnecessarily when the general solution works is a different point. But your comment *all incorrect* seems to indicate that you have greater concerns than that, do you? – David Rodríguez - dribeas Oct 03 '11 at 10:33
  • @DavidRodríguez-dribeas: Also after discussing this in the chat Alf P. Steinbach brought another important point (http://chat.stackoverflow.com/transcript/message/1594676#1594676). If you pass a function a reference to an object that doesn't live long enough then you're already in UB land and the C++ implementation is not required to behave in any prescribed way. `v[0]` is not going to live long enough in case of reallocation and therefore the implementation is free to do whatever it wants to do. – 6502 Oct 03 '11 at 10:36
  • @6502: You are in UB when *using* a reference to an object that is no longer *alive*, but if you only use that reference when the object is *alive*, then there is no UB – David Rodríguez - dribeas Oct 03 '11 at 11:19
  • @DavidRodríguez-dribeas: Since it's **possible** that in a **compliant implementation** of `std::vector::push_back` deallocation is done before copying then it's **possible** that in a **compliant implementation** `v.push_back(v[0])` will be UB. Therefore `v.push_back(v[0])` is not safe in C++. It may be safe in a specific implementation that takes care of it and where this is documented by the vendor, but it's not safe in general for the C++ language. Code relying on that working is at least non portable. – 6502 Oct 03 '11 at 12:16
  • @David: as I mentioned in the chat, I (sorry) did not fully grok your position before reacting. however, my gut feeling is often correct, and in this case it turned out to be also formally correct for c++11. the key for the formal pov is the phrase "Unless otherwise specified", which lists the discussion of `std::vector::push_back` as one of those otherwise specified areas. – Cheers and hth. - Alf Oct 03 '11 at 14:26
4

I guess this would be safe:

std::vector<int> a(1);
a.push_back(1);
a.push_back(int(a[0]));
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 1
    @James: Of course, it runs afoul of a VC++ bug where an identity cast doesn't create a prvalue temporary. – Ben Voigt Jun 02 '11 at 05:25
  • That's actually something I'd thought of, but I wasn't sure if it created a temporary variable or not... it seems like a pretty "obvious" optimization for the compiler to make (mistakenly). – user541686 Jun 02 '11 at 05:29
  • 3
    The standard guarantees through the exception guarantees in the container that `a.push_back( a[0] );` is safe, and it is more straightforward than this approach. While it is dangerous in a hand crafted container, it is safe in a standards conforming implementation. – David Rodríguez - dribeas Jun 02 '11 at 08:30
  • 2
    This is an ideal way for employing the unary-+ operator. `a.push_back(+a[0])` – Johannes Schaub - litb Jun 02 '11 at 10:57
  • 2
    @Johannes No, that’s just being clever and obscuring semantics without any benefit. – Konrad Rudolph Oct 03 '11 at 11:30
0

In push_back(const T& el); implementation to check if el is inside array or other internal storage. That is the only politically correct way of dealing with such problems.

Container should handle this as different containers - different safety rules.

c-smile
  • 26,734
  • 7
  • 59
  • 86
  • Most straightforward solution would be for `push_back` to do the following: allocate new storage, copy move existing elements, append new element, destroy old storage. – Ben Voigt Jun 02 '11 at 05:27
  • 2
    @Ben But in this case, the new element will then refer to a moved from object. – Luc Danton Jun 02 '11 at 06:14
  • 2
    @Luc Danton: The correct sequence (and how it is implemented in g++) is: allocate, copy construct new element, copy/move all other elements. The reason is that the container must be left untouched as per 12.1/10 if an exception is thrown, and copy construction can throw. – David Rodríguez - dribeas Jun 02 '11 at 08:37
  • @Ben Voigt: That is almost the correct approach, but a conforming compiler must first *copy construct* the new element (if it is to use move semantics), as it must guarantee that if that throws, the container is left *untouched*. §12.1/10 second point. In a C++03 compiler (no move semantics) your description is correct. Some compilers implement under the hood move semantics when growing (Dinkumware will *swap* the contained vectors in a `vector >` to avoid the cost of copying actually implementing *move* in C++03 through SFINAE, for example. Those compilers must still be safe. – David Rodríguez - dribeas Jun 02 '11 at 08:41
  • @Luc: You're right, the correct sequence is as described by @David. – Ben Voigt Jun 02 '11 at 13:41
0

This probably isn't a useful answer for you, but IMHO the "right" way is that the container class should handle aliasing correctly, so that the caller doesn't have to worry about it. In particular, push_back() (or equivalent) should do the following:

// C++-ish pseudo-code, exception-safety left as an exercise for the reader
void push_back(const T & t)
{
   if (current_size == alloced_size)
   {
      // Oops, our data array is full.  Time to trade it in for a bigger one
      T * newArray = new T[alloced_size*2];
      copy_items(newArray, current_array, current_size);
      newArray[current_size++] = t;
      delete [] current_array;    // delete old array only AFTER all references to t
      current_array = new_array;
      alloced_size *= 2;
   }
   else current_array[current_size++] = t;
}
Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • Actually, it *is* useful, but just not complete. I totally agree that the callee needs to take care of this, rather than the caller. My question is, *how do you notice* a bug like this in your own code? What do you look for? – user541686 Jun 03 '11 at 02:20
  • I think what your question really boils down to is, how does one notice a bug (or at least, an undocumented gotcha) in library code? The answer is, unless you are willing to manually read through the STL codebase until you understand it all, there simply isn't a practical way to know -- you mostly just have to trust the libraries to be bug-free. What you *can* do is run your program under valgrind (or Purify, or similar) and see if they report any memory errors at runtime, though. Tools like this will often point out bugs whose symptoms would otherwise be quite subtle and mysterious. – Jeremy Friesner Jun 03 '11 at 23:58
  • I'm not referring to library code in the second question -- I'm asking how I would notice such a bug in my *own* code. Interesting point about Valgrind, thanks. – user541686 Jun 04 '11 at 02:30
-2

I'm just winging this, so please don't consider it gospel, but would this work?

a.push_back(1);
a.push_back(&(new int(a[0])));
King Skippus
  • 3,801
  • 1
  • 24
  • 24
  • 3
    If you want a memory leak, sure! (Edit: Actually, no, it doesn't work anyway. You need to use `*` instead of `&`.) – user541686 Jun 02 '11 at 05:19
  • Which is why I specifically stated, "I'm just winging this, so please don't consider it gospel..." It was merely intended as an idea to try and test. – King Skippus Jun 02 '11 at 05:25