1

I have two questions, a general one about pointer type-manipulation in general, and then one for a specific case I have.

What happens when you access a buffer of memory using pointers of different types?

In practice on many different compilers, it seems to work out as my brain would like to envision it. However, I sort-of know it's UB in many (if not all cases). For example:

typedef unsigned char byte;
struct color { /* stuff */};

std::vector<color> colors( 512 * 512 );
// pointer of one type
color* colordata = colors.data();
// pointer to another type?
byte* bytes = reinterpret_cast<byte*>( colordata );

// Proceed to read from (potentially write into) 
// the "bytes" of the 512 * 512 heap array

The first question would be: Is there any point where doing this kind of conversion is legal/safe/standard-sanctioned?

The second question: spinning off the first, if you knew that the struct named color was defined as:

 struct color { byte c[4]; };

Now, is it legal/safe/standard-sactioned? Read-safe? Read / Write safe? I'd like to know, as my intuition tells me that for these very simple structs, the above naughty pointer manipulation isn't that bad, or is it?

[ Reopen Reasons: ] While the linked question about strict aliasing applies somewhat here, it is mostly about C. The one answer referencing the C++03 standard may be outdated when compared to the C++11 standard (unless absolutely nothing has changed). This question has a practical application and I and others would benefit from more answers. Finally, this question is very specific in asking whether it is not only read-safe, write-safe, or both (or neither, and in two different scenarios (PoD data where the underlying types match and a more general case of arbitrary internal data).

  • The most obvious counter question would be "why would you want to do that in the first place?". – stefan May 08 '13 at 10:14
  • This has come from chat, we have no idea why he wants to do this, even having told him many better solutions to his problem. – thecoshman May 08 '13 at 10:15
  • 1
    @thecoshman he doesn't even know it himself. –  May 08 '13 at 10:17
  • In the case of doing something like working with PNG files, after Memory-Mapping the file to work with the bytes in it, PNG specifies its decoding and encoding process in terms of `bytes`, despite it's final product being RGBA or RGBX colors. Knowing whether or not I can read/write from the color buffer as bytes is useful in this one case. –  May 08 '13 at 10:17
  • 1
    You might want to learn about the strict aliasing rule: http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule – pmr May 08 '13 at 10:18
  • 1
    As you have already been told, read the byte data from the png data, then push it into a native format you wish to work with. – thecoshman May 08 '13 at 10:19
  • This is legal, you're aliasing through an `unsigned char`. It's explained in the dupe. – jrok May 08 '13 at 10:22
  • @jrok Well, I guess I'll just delete the question then. I'm glad I could figure it out though. –  May 08 '13 at 10:26
  • @jrok what dupe? the question linked to by pmr do you mean? – thecoshman May 08 '13 at 10:27
  • @thecoshman yes, that one. – jrok May 08 '13 at 10:33
  • @jrok This is not a dupe of that. Shall we dance the 'C++ is not C' dance one more time? – thecoshman May 08 '13 at 10:34
  • You'll be dancing alone :) the answers there aplly to c++ too. – jrok May 08 '13 at 10:35
  • They may be, but they are targeted towards C specifically, if C++ diverges, this question would no longer have any relation to that, other then perhaps for historic reasons. Whilst it is a nice reference, it is clearly not a dupe. – thecoshman May 08 '13 at 10:37
  • @thecoshman I'm not aware of any difference between C++/C or C++11/03 here. – pmr May 08 '13 at 15:16
  • @pmr I think you missed my point – thecoshman May 09 '13 at 07:58
  • @pmr: C++11 contains some fairly different wording about what is and is not legal for unions. – Puppy Jul 13 '13 at 02:49

2 Answers2

5

Both are legal.

Firstly, since byte is a typedef for unsigned char, it has a magical get-out-of-jail-free when it comes to strict aliasing. You can alias any type as char or one of it's signed or unsigned derivatives.

Secondly, it is entirely legal in both C and C++ for a struct to be cast to a pointer to the type of it's first element, as long as it meets certain guarantees like being POD. This means that

struct x {
    int f;
};
int main() {
    x var;
    int* p = (int*)&var;
}

does not violate strict aliasing either, even without the getout clause used for char.

Puppy
  • 144,682
  • 38
  • 256
  • 465
3

As has been stated in the comments: Accessing the same piece of memory as two different types is UB. So, that's the formal answer (note that "UB" does include "doing precisely what you would expect if you are a sane person reading the code" as well as "just about anything other than what a sane person reading the code would expect")

Having said that, it appears that all popular compilers tend to cope with this fairly well. It is not unusual to see these sort of constructs (in "good" production code - even if the code isn't strictly language-lawyer correct). However, you are at the mercy of the compiler "doing the right thing", and it's definitely a case where you may find compiler bugs if you stress things too harshly.

There are several reasons that the standard defines this as UB - the main one being that "different types of data may be stored in different memory" and "it can be hard for the compiler to figure out what is safe when someone is mucking about casting pointers to the same data with different types" - e.g. if we have a pointer to a 32-bit integer and another pointer to char, both pointing to the same address, when is it safe to read the integer value after the char value has been written. By defining it as UB, it's entirely up to the compiler vendor to decide how precisely they want to treat these conditions. If it was "defined" that this will work, compilers may not be viable for certain processor types (or code would become horribly slow due to the effect of the liberal sprinkling of "make sure partial memory writes have completed before I read" operations, even when those are generally not needed).

So, in summary: It will most likely work on most processors, but don't expect any language lawyer to approve of your code.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • I'll accept this answer for now, seeing as the question got canned. It was unfortunate because I was actually hoping to see if any language lawyers would show up to let me know just "how UB" the code was, or rather which parts were UB and if there were ways around it that still allowed for access through differently typed pointers... –  May 08 '13 at 10:56
  • 1
    Ok, so the "strictly legal" says "You can't access memory from the same place using different pointers if you do it in separate functions". This also applies to `union` access, by the way. – Mats Petersson May 08 '13 at 11:16
  • 1
    @ThePhD UB is UB. Your specific example isn't, because you hit a corner case. The linked question explains all this and also has reasonable workarounds for cases where something like this is really needed. – pmr May 08 '13 at 15:15
  • Actually, neither of these examples is illegal. I certainly don't *approve* of this code, but it does not violate the specification. – Puppy Jul 13 '13 at 02:47