Out of bounds array accesses in C++ and reinterpret_cast

Question

Say I have code like this

struct A {
  int header;
  unsigned char payload[1];
};

A* a = reinterpret_cast<A*>(new unsigned char[sizeof(A)+100]);

a->payload[50] = 42;

Is this undefined behavior? Creating a pointer that points outside payload should be undefined AFAIK, but I'm unsure whether this is also true in the case where I have allocated the memory after the array.

The standard says p[n] is the same as *(p+ n) and "if the expression P poinst to the i-th element of an array object, the expressions (P)+N point to the i+n-th elements of the array". In the example payload points to an element in the array allocated with new, so this might be ok.

If possible, it would be nice if your answers contained references to the C++ standard.

Out of this scope but of curiosity: `unsigned char`? Is there a `signed char`? — Khalil Khalaf, Jul 01 '16 at 14:47
@FirstStep https://msdn.microsoft.com/nl-nl/library/s3f49ktz.aspx Yes, there is. — Hatted Rooster, Jul 01 '16 at 14:48
@FirstStep a `char` is usually (but not always) a signed char, and there's `signed char`, which is always signed. — alain, Jul 01 '16 at 14:49
Undefined part might be that the layout of the A object is implementation dependent. For many implementations the start of payload will be at the fourth byte of the object; in that case payload[50] will be the 54th byte in the 100 bytes allocated. And thus tis should give defined behaviour. However, sizeof(int) is not always 4 bytes; and payload does not need to be aligned at the start of A + sizeof(int). — Klamer Schutte, Jul 01 '16 at 14:56
@KlamerSchutte I added a sizeof(A) to my allocation to make sure that the allocated array is big enough for the access. — adrianN, Jul 01 '16 at 14:58
This may not be sufficient, you need at least sizeof(int)+50+alignment of payload. — Jean-Baptiste Yunès, Jul 01 '16 at 15:02
The standard ends with "provided they exist", so you need to take the alignment in account. — Jean-Baptiste Yunès, Jul 01 '16 at 15:03
My gut feeling is that you're ok provided that `A` is a POD, but there are much nicer ways of dealing with this. — Richard Hodges, Jul 01 '16 at 15:04
@BaummitAugen the size in the type may be less than the size of the array, out of bounds refers to access outside the storage, and this is not related to the declared size in the type, as array id decay to pointer. — Jean-Baptiste Yunès, Jul 01 '16 at 15:08
I believe you want `static_cast` here. Also, to get well-defined behavior, you might need to use placement new to create an actual object, I'm not sure. Better to post a question on how to do this correctly! — , Jul 01 '16 at 15:55
I think this is duplicate of: http://stackoverflow.com/a/4413035/471160 — marcinj, Jul 01 '16 at 16:06
@Jean-BaptisteYunès I'm confused. Are people still trying to defend this old hack? "out of bounds refers to access outside the storage, and this is not related to the declared size in the type" - no, it doesn't, and yes it is. Accessing an index outwith the size defined for the array is UB, plain and simple. Whether or not the array is a member of some containing object, which owns additional storage to which _you wish_ the out-of-bounds access refers, is irrelevant. The fact that array names implicitly convert to pointers in many cases is also irrelevant. This has always been UB in C and C++. — underscore_d, Jul 03 '16 at 05:01
@underscore_d Ok but what about pointers and dynamically allocated memory? No size on the type, but no UB 'til you access what has been allocated. I think too many people over-interpret what UB is in those cases. And what about flexible array member? — Jean-Baptiste Yunès, Jul 03 '16 at 06:03
@Jean-BaptisteYunès "I think too many people over-interpret what UB is" There's no room for interpretation here: `reinterpret_cast` has _very few_ uses that aren't UB or _at best_ implementation-defined, and this clearly isn't one of them. And flexible array members are not part of C++, so I don't know why you mention them. — underscore_d, Jul 03 '16 at 14:28
@alain `char` is a distinct type that isn't required to be equivalent to either of `signed char` or `unsigned char`, whether or not it might be in practical implementations. (an academic distinction, probably, but that describes many things in this language, which are still worth knowing) — underscore_d, Jul 03 '16 at 14:32
@underscore_d yes they are distinct types, but (3.9.1) *In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined.* — alain, Jul 03 '16 at 16:29
@alain Yeah, a plain `char` will mirror the available values of one of the two qualified types, depending on implementation, but the three are distinct in that they get different `typeid`s and etc. Just wanted to add that nuance. — underscore_d, Jul 03 '16 at 16:40
@underscore_d yes, that's indeed interesting, since that's different from `int`/`signed int` which are the same. Thanks! — alain, Jul 03 '16 at 17:24
"When an expression that has integral type is added to a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element", " and std refers to `new` as to allocator for array objects, and `payload` decays to pointer. As the cast is ok... or? — Jean-Baptiste Yunès, Jul 04 '16 at 07:02

score 2 · Accepted Answer · answered Jul 01 '16 at 15:11

2

So the reinterpret_cast is undefined behavior, we can reinterpret_cast to a char or unsigned char we can never cast from a char or unsigned char, if we do:

Accessing the object through the new pointer or reference invokes undefined behavior. This is known as the strict aliasing rule.

So yes this is a violation of the strict aliasing rule.

answered Jul 01 '16 at 15:11

Jonathan Mee

37,899
23
129
288

I suspected that this violates strict aliasing. Could you provide a more explicit reference/a more detailed explanation for "we can never cast from a char"? – adrianN Jul 01 '16 at 15:13
1

@adrianN Click on the first `reinterpret_cast` ;) – Jonathan Mee Jul 01 '16 at 15:13
This is for dynamic type not POD. – Jean-Baptiste Yunès Jul 04 '16 at 06:36
@Jean-BaptisteYunès Wait are you saying that these rules do not apply to POD types? I had been informed that this applied to POD types as well: http://stackoverflow.com/questions/28697626/why-doesnt-reinterpret-cast-force-copy-n-for-casts-between-same-sized-types If you know differently please share! – Jonathan Mee Jul 04 '16 at 12:33
your link just refers to alignment problems and trap representation, which none of them is involved here. – Jean-Baptiste Yunès Jul 04 '16 at 13:03
@Jean-BaptisteYunès Can you enlighten me on what you mean here? It sounds to me as though you're expressing that: Because the `unsigned char*` that the memory was originally allocated to does not persist we cannot have an aliasing issue. Is that what you mean by trap problems? – Jonathan Mee Jul 06 '16 at 12:28

score 1 · Answer 2 · answered Jul 01 '16 at 15:15

Consider the code:

struct {char x[4]; char a; } foo;

int work_with_foo(int i)
{
  foo.a = 1;
  foo.x[i]++;
  return foo.a;
}

Even though the program would "own" the storage at foo.x+4, the fact that access via the array type is only defined for the first four elements would allow a compiler to, among other things, replace the above code with either of the following:

int work_with_foo(int i)  { foo.a = 1; foo.x[i]++; return 1; }

int work_with_foo(int i)  { foo.x[i]++; foo.a = 1; return 1; }

The above substitutions are clearly permissible under the Standard. It is less clear what alternate ways of writing the increment would force the compiler to behave as though it reloads foo.a. For example, I think the code *(i+(char*)&foo)+=1; would have defined behavior when i equals the offset of foo.a, and I would think the same should be true of *(i+(char*)&foo.x)+=1; but I'm not sure about *(i+foo.x)+=1; or *(i+(char*)foo.x)+=1;.

Richard Hodges · Answer 3 · 2016-07-01T15:48:37.080

This old C hack is never necessary in C++.

consider:

#include <cstdint>
#include <utility>
#include <memory>

template<std::size_t Size>
struct A {
  int header;
  unsigned char payload[Size];
};

struct polyheader
{
  struct concept
  {
    virtual int& header() = 0;
    virtual unsigned char* payload() = 0;
    virtual std::size_t size() const = 0;
    virtual ~concept() = default;  // not strictly necessary, but a reasonable precaution
  };

  template<std::size_t Size>
  struct model : concept
  {
    using a_type = A<Size>;
    model(a_type a) : _a(std::move(a)) {}
    int& header() override {
      return _a.header;
    }

    unsigned char* payload() override {
      return _a.payload;
    }

    std::size_t size() const override {
      return Size;
    }

    A<Size> _a;
  };

  int& header() { return _impl->header(); }
  unsigned char* payload() { return _impl->payload(); }
  std::size_t size() const { return _impl->size(); }

  template<std::size_t Size>
  polyheader(A<Size> a) 
    : _impl(std::make_unique<model<Size>>(std::move(a)))
    {}

  std::unique_ptr<concept> _impl;
};


int main()
{
  auto p1 = polyheader(A<40>());
  auto p2 = polyheader(A<80>());

}

This doesn't work unless the size is known at compile time and you can tolerate the performance penalties of having your most basic operations being non-inlined and virtual dispatch! — , Jul 01 '16 at 15:50
@Hurkyl if you look at the question you will see that the op does indeed know the size at compile time. If you wish to use the template `A` directly, of course you may. It is trivial to extend this class to deal with variable-sized buffers if necessary. The poly wrapper is mere here to provide polymorphism should the OP require it. — Richard Hodges, Jul 01 '16 at 15:53
+1 purely for `This old C hack is never necessary in C++.` There's this thing called `std::vector` that's generating a lot of buzz among the innovators. — underscore_d, Jul 03 '16 at 05:07

Out of bounds array accesses in C++ and reinterpret_cast

3 Answers3