12

Following a question asked here earlier today and multitudes of similary themed questions, I'm here to ask about this problem from stadard's viewpoint.

struct Base
{
  int member;
};

struct Derived : Base
{
  int another_member;
};

int main()
{
  Base* p = new Derived[10]; // (1)
  p[1].member = 42; // (2)
  delete[] p; // (3)
}

According to standard (1) is well-formed, because Dervied* (which is the result of new-expression) can be implicitly converted to Base* (C++11 draft, §4.10/3):

A prvalue of type “pointer to cv D”, where D is a class type, can be converted to a prvalue of type “pointer to cv B”, where B is a base class (Clause 10) of D. If B is an inaccessible (Clause 11) or ambiguous (10.2) base class of D, a program that necessitates this conversion is ill-formed. The result of the conversion is a pointer to the base class subobject of the derived class object. The null pointer value is converted to the null pointer value of the destination type.

(3) leads to undefined behaviour because of §5.3.5/3:

In the first alternative (delete object), if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the dynamic type of the object to be deleted and the static type shall have a virtual destructor or the behavior is undefined. In the second alternative (delete array) if the dynamic type of the object to be deleted differs from its static type, the behavior is undefined.

Is (2) legal according to standard or does it lead to ill-formed program or undefined behaviour?

edit: Better wording

Community
  • 1
  • 1
Vitus
  • 11,822
  • 7
  • 37
  • 64
  • Why are we assuming (2) is ill-formed? – Kerrek SB Aug 25 '11 at 22:09
  • (2) is very ill-formed because it uses sizeof(Base) to compute the distance between p[0] and p[1]. – Bo Persson Aug 25 '11 at 22:12
  • 5
    It's not ill-formed, it's just UB because p doesn't point to an element of an array object (the condition for pointer arithmetic to work), it points to a base class sub-object of an array element so the array access is invalid. – CB Bailey Aug 25 '11 at 22:15
  • 1
    @Kerrek SB: Perhaps the last question should have been worded little bit differently, but since major implementations (tested with gcc, clang and MSVC) don't get it right, I _assumed_ `(2)` is ill-formed. I spent last two hours searching something like what Bo Persson said, i.e. `(p + n)` uses static type of `p` to compute offset, but I got the feeling that paragraph concerning `operator+` doesn't imply this. – Vitus Aug 25 '11 at 22:19
  • @Charles Bailey: Oh, that indeed makes sense. Please, do post it as answer. – Vitus Aug 25 '11 at 22:20
  • Sorry, I don't have access to the standard right now but there's an "otherwise the behavior is undefined" in the section on the addition operator when applied to a pointer and an integer that describes the requirement. (IIRC) – CB Bailey Aug 25 '11 at 22:28
  • @Charles Bailey: I think this is the part of standard you speak of (§5.7/5): (...) If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. – Vitus Aug 25 '11 at 22:33
  • Strictly speaking, if this is the case, then the behaviour is undefined even when `sizeof(Base) == sizeof(Derived)`, though most implementations will get "right". – Vitus Aug 25 '11 at 22:37

4 Answers4

5

If you look at the expression p[1], p is a Base* (Base is a completely-defined type) and 1 is an int, so according to ISO/IEC 14882:2003 5.2.1 [expr.sub] this expression is valid and identical to *((p)+(1)).

From 5.7 [expr.add] / 5, when an integer is added to a pointer, the result is only well defined when the pointer points to an element of an array object and the result of the pointer arithmetic also points the an element of that array object or one past the end of the array. p, however, does not point to an element of an array object, it points at the base class sub-object of a Derived object. It is the Derived object that is an array member, not the Base sub-object.

Note that under 5.7 / 4, for the purposes of the addition operator, the Base sub-object can be treated as an array of size one, so technically you can form the address p + 1, but as a "one past the last element" pointer, it doesn't point at a Base object and attempting to read from or write to it will cause undefined behavior.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • I wonder if you might care to weigh in on [a similar question I asked](http://stackoverflow.com/q/19843816/33732) in which `Derived` adds no data members on top of what `Base` already has. It's gotten some answers, but I need help judging what's right. – Rob Kennedy Nov 09 '13 at 05:44
  • 1
    @RobKennedy: I've re-read my answer and I don't why you think it only applies when `sizeof(Derived) != sizeof(Base)`. – CB Bailey Nov 09 '13 at 11:28
4

(3) leads to undefined behaviour, but it is not ill-formed strictly speaking. Ill-formed means that a C++ program is not constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.

Same for (2), it is well-formed, but it does not do what you have probably expected. According to §8.3.4/6:

Except where it has been declared for a class (13.5.5), the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)). Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore, despite its asymmetric appearance, subscripting is a commutative operation.

So in (2) you will get the address which is the result of p+sizeof(Base)*1 when you probably wanted to get the address p+sizeof(Derived)*1.

Kirill V. Lyadvinsky
  • 97,037
  • 24
  • 136
  • 212
  • Silly me, I read undefined and write ill-formed. Fixed, thank you. – Vitus Aug 25 '11 at 22:24
  • The thing is, §5.7/5 does not seems to imply that `(char*)(p + 1) == (char*)p + sizeof(Base)` - unless I'm reading it wrong. That is the point of my question, I think. – Vitus Aug 25 '11 at 22:27
1

The standard doesn't disallow (2), but it's dangerous nevertheless.

The problem is that doing p[1] means adding sizeof(Base) to the base address p, and using the data at that memory location as an instance of Base. But chances are very high that sizeof(Base) is smaller than sizeof(Derived), so you'll be interpreting a block of memory starting in the middle of a Derived object, as a Base object.

More information in C++ FAQ Lite 21.4.

Sander De Dycker
  • 16,053
  • 1
  • 35
  • 40
0
p[1].member = 42; 

is well formed. Static type for p is Derived and dynamic type is Base. p[1] is equivalent to *(p+1) which seems a valid and is a pointer to first element of dynamic type Base in array.

However, *(p+1) in fact refers to an array member of type Derived. Code p[1].member = 42; shows you think you are referring to an array member with type Base.

kiriloff
  • 25,609
  • 37
  • 148
  • 229