4

I am trying to understand the strict aliasing rule. By reading the answer to What is the strict aliasing rule , I have some further questions. Here I borrow the example from that post:

struct Msg { uint32_t a, b; }
void SendWord(uint32_t);
void SendMessage(uint32_t* buff, uint32_t buffSize)
{
  for (uint32_t i = 0; i < buffSize; ++i) SendWord(buff[i]);
}

Now consider the following code (A):

uint32_t *buff = (uint32_t*)malloc(sizeof(Msg));
std::memset(buff, 0, sizeof(Msg));
Msg *msg = (Msg*)buff; // Undefined behavior.
for (uint32_t i = 0; i < 10; ++i)
{
  msg->a = i;
  msg->b = i + 1;
  SendMessage(buff, 2);
}
free(buff);

For the above code, the author explained what might happen due to the UB. The message sent could contain all 0s: during optimization, the compiler may assume msg and buff point to disjoint memory blocks, and then decide the writes to msg do not affect buff.

But how about the following code (B):

uint32_t *buff = (uint32_t*)malloc(sizeof(Msg));
std::memset(buff, 0, sizeof(Msg));
unsigned char *msg = (unsigned char*)buff; // Compliant.
for (uint32_t i = 0; i < sizeof(Msg); ++i)
{
  msg[i] = i + 1;
  SendMessage(buff, 2);
}
free(buff);

Is the sent message guaranteed to be as intended (as if the strict aliasing complier flag is off) ? If so, is it simply because a *char (msg) points to the same place as buff, the compiler should and will notice it, and refrain from the possible, aforementioned optimization in (A) ?

Yet then I read another comment under the post Strict aliasing rule and 'char *' pointers, saying that it is UB to use the *char pointer to write the referenced object. So again, code (B) could still result in similar, unexpected behavior ?

user2961927
  • 1,290
  • 1
  • 14
  • 22
  • 1
    Read [__Type aliasing__](https://en.cppreference.com/w/cpp/language/reinterpret_cast) here Specifically allowed - _"AliasedType is std::byte, (since C++17) char, or unsigned char: this permits examination of the object representation of any object as an array of bytes."_ – Richard Critten Jan 20 '22 at 00:27
  • @RichardCritten That looks like it could be fleshed out into an answer. – cigien Jan 20 '22 at 01:23
  • @RichardCritten I see. So the full sentence is "Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true: (3) AliasedType is std::byte, (since C++17) char, or unsigned char". This means writing to that char pointer is valid right? I'll accept your answer if you want to flesh it out. – user2961927 Jan 20 '22 at 01:52
  • The answer you link to only applies to C (and furthermore, is wrong!); the strict aliasing rule is different in C and C++. – M.M Jan 20 '22 at 02:20
  • @M.M By wrong, were you referring to the second link in my post? – user2961927 Jan 20 '22 at 02:42
  • By "wrong" I am referring to the answer https://stackoverflow.com/a/99010/1505939 (the first code sample in that answer is correct in C) – M.M Jan 20 '22 at 02:48

1 Answers1

1

First of all , the answer https://stackoverflow.com/a/99010/1505939 is applying to C (and is in fact completely wrong, but that's another story). You ask about C++ and the strict aliasing rule is set up differently in C than C++. So that answer has nothing to do with this question.

Prior to C++20, both versions of your code cause undefined behaviour (by omission) as the behaviour of the assignment operator is only defined for the case of writing to an object. The malloc function in C++ allocates space but does not create objects within that space. The task should be approached using various other constructs such as new, or higher level containers, which do both allocate space and create objects within the space ready for writing.

Trying to analyze this code (pre-C++20) in context of the C++ strict aliasing rule is not possible because the definition of the rule is about accessing the stored value of an object but in this code it is not accessing any object, since no object was created.


Since C++20 there is a new provision (N4860 [intro.object]/10) that objects can be implicitly created by the assignment operator, if there exists such a possible combination of objects that would make the code well-defined. (Otherwise the behaviour remains undefined).

Under this change to the object model, both of your code samples are well-defined. In (A) there can be implicitly-created uint32_t objects in the space, and in (B) there can be implicitly-created unsigned char objects in the space. Since your code does not write as one type and then read as a different type, there is no possibility of an aliasing violation.

The existence of intermediate pointers of various types such as buff has no bearing on strict aliasing (in either language) -- the rule is strictly about how the space is read and written; not about how we got to the space.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • I do not think this is correct. The implicitly-created object would need to be a `Msg` (with its subobjects). Otherwise `msg->a = i` is going to be undefined behavior. Passing it as `uint32_t*` is then allowed only because `Msg` is standard-layout and the first member is `uint32_t`, thus pointer-interconvertible. However, pointer arithmetic in `buff[i]` to get to the second member is then not allowed. In (B) the created objects would need to be an (array of) `uint32_t`, but accessing through `unsigned char` will be allowed. – user17732522 Jan 20 '22 at 05:13
  • @user17732522 well there's no `buff[i]` in the actual code samples A and B being asked about – M.M Jan 21 '22 at 00:26
  • You bring up the old question of whether `msg->a` requires all of `*msg` to exist , or just an element of the correct type for `a`, this has never been specified clearly in the standard although I do concede that the view that all of `*msg` must exist is the more common view – M.M Jan 21 '22 at 00:29