27

I've been trying to understand the strict aliasing rules as they apply to the char pointer.

Here this is stated:

It is always presumed that a char* may refer to an alias of any object.

Ok so in the context of socket code, I can do this:

struct SocketMsg
{
   int a;
   int b;
};

int main(int argc, char** argv)
{
   // Some code...
   SocketMsg msgToSend;
   msgToSend.a = 0;
   msgToSend.b = 1;
   send(socket, (char*)(&msgToSend), sizeof(msgToSend);
};

But then there's this statement

The converse is not true. Casting a char* to a pointer of any type other than a char* and dereferencing it is usually in violation of the strict aliasing rule.

Does this mean that when I recv a char array, I can't reinterpret cast to a struct when I know the structure of the message:

struct SocketMsgToRecv
{
    int a;
    int b;
};

int main()
{
    SocketMsgToRecv* pointerToMsg;
    char msgBuff[100];
    ...
    recv(socket, msgBuff, 100);
    // Ommiting make sure we have a complete message from the stream
    // but lets assume msgBuff[0]  has a complete msg, and lets interpret the msg

    // SAFE!?!?!?
    pointerToMsg = &msgBuff[0];

    printf("Got Msg: a: %i, b: %i", pointerToMsg->a, pointerToMsg->b);
}

Will this second example not work because the base type is a char array and I'm casting it to a struct? How do you handle this situation in a strictly aliased world?

cokeman19
  • 2,405
  • 1
  • 25
  • 40
Doug T.
  • 64,223
  • 27
  • 138
  • 202

2 Answers2

6

Re @Adam Rosenfield: The union will achieve alignment so long as the supplier of the char* started out doing something similar.

It may be useful to stand back and figure out what this is all about.

The basis for the aliasing rule is the fact that compilers may place values of different simple types on different memory boundaries to improve access and that hardware in some cases may require such alignment to be able to use the pointer at all. This can also show up in structs where there is a variety of different-sized elements. The struct may be started out on a good boundary. In addition, the compiler may still introduce slack bites in the interior of the struct to accomplish proper alignment of the struct elements that require it.

Considering that compilers often have options for controlling how all of this is handled, or not, you can see that there are many ways that surprises can occur. This is particularly important to be aware of when passing pointers to structs (cast as char* or not) into libraries that were compiled to expect different alignment conventions.

What about char*?

The presumption about char* is that sizeof(char) == 1 (relative to the sizes of all other sizable data) and that char* pointers don't have any alignment requirement. So a genuine char* can always be safely passed around and used successfully without concern for alignment, and that goes for any element of a char[] array, performing ++ and -- on the pointers, and so on. (Oddly, void* is not quite the same.)

Now you should be able to see how if you transfer some sort of structure data into a char[] array that was not itself aligned appropriately, attempting to cast back to a pointer that does require alignment(s) can be a serious problem.

If you make a union of a char[] array and a struct, the most-demanding alignment (i.e., that of the struct) will be honored by the compiler. This will work if the supplier and the consumer are effectively using matching unions so that casting of the struct* to char* and back works just fine.

In that case, I would hope that the data was created in a similar union before the pointer to it was cast to char* or it was transferred any other way as an array of sizeof(char) bytes. It is also important to make sure any compiler options are compatible between the libraries relied upon and your own code.

orcmid
  • 2,618
  • 19
  • 20
  • are all 3 of `char`, `signed char`, `unsigned char` OK for aliasing ? and with any CV-qualification combination as well ? – v.oddou Apr 22 '15 at 03:36
  • 3
    The aliasing rules have nothing to do with alignment. Per the C89 rationale, given global declarations like `int i; float *fp;`, the purpose is to allow compilers to keep `i` in a register across accesses to `*fp`. The idea was that a compiler shouldn't have to pessimistically assume that a write to `*fp` might alter `i` *when it had no reason to expect* that `*fp` would point at something that wasn't a `float`*. I don't think the rule was ever intended to let compilers ignore cases where aliasing is obvious (taking the address of an object should give a compiler a strong clue... – supercat Jun 23 '16 at 19:06
  • ...that the object in question is about to be accessed via pointer, and casting an `int*` to a `float*` should give the compiler a strong clue that an `int` is likely to be modified via write to a `float*`, but gcc no longer feels any obligation to notice such things. – supercat Jun 23 '16 at 19:14
  • 1
    Strict aliasing is not because of alignment requirement. How can this answer get 9 upvotes? – Ajay Brahmakshatriya Nov 13 '17 at 02:37
4

Correct, the second example is in violation of the strict aliasing rules, so if you compile with the -fstrict-aliasing flag, there's a chance you may get incorrect object code. The fully correct solution would be to use a union here:

union
{
  SocketMsgToRecv msg;
  char msgBuff[100];
};

recv(socket, msgBuff, 100);

printf("Got Msg: a: %i, b: %i", msg.a, msg.b);
Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • 1
    Is this in compliance with the standard or just compiler letting you get away with writing to one member and reading from another? – Alex B May 06 '10 at 00:26
  • 10
    The union is completely unnecessary. Simply pass a pointer to the structure (cast to `char *`) to `recv`. – R.. GitHub STOP HELPING ICE Sep 03 '10 at 01:38
  • Note that `-fstrict-aliasing` is on by default at `-O2` and higher in gcc – M.M Nov 13 '17 at 03:07
  • @R..GitHubSTOPHELPINGICE could you elaborate on that further please? – Nubcake Sep 09 '20 at 12:10
  • There are multiple things wrong with this answer. (1) `recv` has four arguments. (2) There’s no need for type punning via a union here. (3) But we also don’t need to cast, contrary to what a previous comment said. (4) In fact, the conventional usage is as simple as `SocketMsgToRecv msg; ssize_t ret = recv(socket, &msg, sizeof msg, flags);` — and of course we always need to handle errors. (I realise this is an old answer but it’s one of the top hits on Google.) – Konrad Rudolph Feb 15 '22 at 14:56