-1

I have a situation in a legacy code with a large field of a structure being split into two sub-fields. For example, a uint32 is split into two uint16's:

typedef struct
{
    uint16 myVar_H;
    uint16 myVar_L;
} MyStruct;

Until now these fields were read from a uint32 variable via shifting to read each uint16 portion. Now we want to optimize this so that we read the entire uint32 by casting the first field's address to a uint32:

MyStruct s;
*((uint32*)(&s.myVar_H)) = myValue;

This works for my compiler. The question is if this is legal and defined in C89 (or other C standards)?

Note: changing the structure is another option, but since this is an ancient legacy code based on a proprietary protocol I'd rather not touch the structures definition.

Edit: I am using a big-endian MCU (a ColdFire), so this works correctly in the endianess regard.

Eli Iser
  • 2,788
  • 1
  • 19
  • 29

2 Answers2

4

There's a few problems with your "optimized" solution:

  1. It violates the strict aliasing rule.
  2. It will not work on little-endian machines (the immense majority of computers nowadays), because you are not reordering correctly the high and low fields.
  3. There is no guarantee that myVar_H and myVar_L are consecutive in memory: the compiler is allowed to add padding between the two fields.

The legal and platform-independant way of achieving what you want is to keep your previous solution with shifting. It shouldn't bring any performance issues.

MyStruct s;
// Reading
uint32 u = (uint32)s.myVar_H << 16 | s.myVar_L; // The cast is important
// Writing
s.myVar_H = myValue >> 16;
s.myVar_L = myValue % 0xFFFF;

Edit following comment: you say you work on big endian machines so memory order is not a problem. In that case, the "most legal" thing you may do is make your struct a union with an anonymous struct (not actual C89, it's a gcc extension). This way you don't break strict aliasing and the new definition doesn't break any existing code.

typedef union // Your type is now a union
{
    struct
    {
         uint16 myVar_H;
         uint16 myVar_L;
    }; // Anonymous struct, legal C11 but gcc extension in C89

    uint32 myVar;
} MyStruct;

s.myVar = myValue; // Safe
Community
  • 1
  • 1
user703016
  • 37,307
  • 8
  • 87
  • 112
  • Incidentally, I am using a big-endian MCU (a ColdFire), so that's okay. Padding is a risk, of course, which can be resolved by forcing a padding size of 1 (via a pragma). – Eli Iser Apr 02 '14 at 12:24
  • @presiuslitelsnoflek After reading some more on strict aliasing, I understand that the aliasing problem is referring to the `uint16` as a `uint32`, right? It's not that I include both fields in the cast to a `uint32` - this is a side effect of the cast, is it not? – Eli Iser Apr 02 '14 at 12:58
  • 1
    @Eli That's right, they are not *compatible* types. The fact that both fields are included is indeed "only" a side effect, but precisely the kind of side effects that led to the invention of the strict aliasing rule. – user703016 Apr 02 '14 at 13:05
  • The `union` version breaks the aliasing rules just the same as OP's original code did. I don't see that it gains anything over OP's version. (Of course, your original suggestion of using arithmetic is the best one, and the compiler SHOULD generate optimal code from it anyway) – M.M Apr 04 '14 at 01:35
  • @Matt That's right, in theory the `union` also breaks aliasing and is UB according to the standard. However, since the OP is willing to take other risks (assuming big endianness and absence of padding), and since [the union trick is well defined under GCC](http://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html#Structures-unions-enumerations-and-bit-fields-implementation), I thought this would be a more or less acceptable solution under OP's constraints. – user703016 Apr 04 '14 at 07:29
-2

Regarding your specific question:

The question is if this is legal and defined in C89?

the answer is no: it violates the strict aliasing rule. See What is the strict aliasing rule?. The compiler is free to do whatever it wants, including completely ignoring the statement:

   *((uint32*)(&s.myVar_H)) = myValue;

A safer approach is to use memcpy; your code snippet becomes:

   MyStruct s;
   memcpy(&s.myVar_H, &myValue, sizeof myValue);

Of course, in this particular instance, &s.myVar_H is the same as &s.

This assumes myValue has to have the right size and format (e.g. it can't be a char or a float). It this is not the case, and if you have C99's compound literals, you can write:

   memcpy(&s.myVar_H, (uint32[]){myValue}, sizeof(uint32));

This is ugly, but you can hide its ugliness in a macro:

   #define LEGACY_CPY32(dst, src) memcpy(&(dst), (uint32[]){src}, sizeof(uint32))

But these suggestion will avoid only the strict-aliasing issue. Litte-endian problems and structure padding problems will still be present, and your code could break if you switch machines and/or compilers, or even just change compiler switches.


Update #1: Of course no one would ever write new code like this. But I am assuming the OP's goal is to update working, legacy code with the minimal number of changes, rather than doing a complete rewrite.

See also:


Update #2: Maybe I can present a more convincing case? For any construct, one can argue in terms of:

  • legality
  • efficiency
  • elegance

For the third criteria, elegance, my suggestion loses hands down, compared with e.g. union punning.

For the second criteria, efficiency, John Regehr in Embedded in Academia, "Type Punning, Strict Aliasing, and Optimization" (June 2013) states regarding a similar issue:

Compilers such as GCC 4.8.1 and Clang ~3.3 (both for x86-64 Linux) fail to generate good code for c2 [an example of the type punning feature of unions] ... [and give] crappy object code

whereas in his example c5 which uses memcpy:

Both compilers understand memcpy deeply enough that we get the desired [optimal] object code.

In my opinion c5 is the easiest code to understand out of this little batch of functions because it doesn't do the messy shifting and also it is totally, completely, obviously free of complications that might arise from the confusing rules for unions and strict aliasing. It became my preferred idiom for type punning a few years ago when I discovered that compilers could see through the memcpy and generate the right code.

Finally, let us look at the first criteria, legality. Regehr states regarding union punning (emphasis added):

Unfortunately this code [c2, using union punning] is also undefined by modern C/C++ dialects (or maybe just unspecified, I'm not totally sure).

I am reluctant to disagree with Regehr, but the consensus on Stack Overflow seems to be that union punning has been valid since C99; see:

But it was not defined behavior in C89, which the OP specifically asks about.

Finally note the OP requests that:

Since this is an ancient legacy code based on a proprietary protocol I'd rather not touch the structures definition

The memcpy solution leaves the header files untouched, which is advantageous if you are working with a team of touchy programmers. And you can easily back out all of your changes by redefining the macro LEGACY_CPY32 to be the previous "incorrect"-but-working functionality:

 #define LEGACY_CPY32(dst, src) (*((uint32*)(&(dst))) = (src))
Community
  • 1
  • 1
Joseph Quinsey
  • 9,553
  • 10
  • 54
  • 77
  • How do you account for padding in the struct across different compilers and hardware platforms? Note, I've downvoted for your use of the word "safest." Had you not included this, I wouldn't have downvoted. As it stands, this is outright incorrect. Please see presius's answer. – San Jacinto Apr 02 '14 at 12:17
  • @SanJacinto: Thank you for your suggestion; I've edited my answer. And I believe presius's answer is backwards. – Joseph Quinsey Apr 02 '14 at 12:25
  • @JosephQuinsey: Why does this avoid anti-aliasing issues? The behavior seems similar - basically treat `&s.myVar_H` as a pointer to a `uint32` or to 4 bytes. The "danger" of referring to two consecutive `struct` fields as a single field is the same. – Eli Iser Apr 02 '14 at 12:27
  • @JosephQuinsey Agreed, he had it backwards but the semantics were correct. I still don't like your solution. There's nothing inherently "unsafe" about bit shifting and byte arithmetic other than human error. Your solution includes built-in problems and isn't really much of a solution. – San Jacinto Apr 02 '14 at 12:28
  • Your `memcpy` variation still has the same aliasing problem, just in a different form. You read bytes which were stored as one type, via an lvalue of a different type. It is no better or worse than the original. (In cases where there are alignment concerns, the memcpy version is "better" in that it avoids alignment problems, however both versions still violate the aliasing rules) – M.M Apr 04 '14 at 01:29
  • BTW, `(uint32*)(&s.myVar_H)` can be replaced by `(uint32*)&s` everywhere - structs are guaranteed to have no initial padding – M.M Apr 04 '14 at 01:30
  • @MattMcNabb: (to the BTW) Of course. And I _do_ say this. But I'm assuming that the OP is just giving us a toy example, and that his real code is much more complicated. – Joseph Quinsey Apr 04 '14 at 01:42
  • @MattMcNabb: (to your first comment) What you are saying appears the similar to the question [C aliasing rules and memcpy](http://stackoverflow.com/q/3275353), in which the OP says that `memcpy` _is_ ok, and asks _why_? But he doesn't get a good answer, I think, even from a 200K user. – Joseph Quinsey Apr 04 '14 at 02:02
  • The `memcpy` itself doesn't break the rule *yet*, but the rule is broken when you access the value via the expression `p`, in that example. The top answer addresses this with "as long as you don't access p afterwards" and " that might cause undefined behavior depending on the bitpattern". The standard description of that is "accessing an object via an lvalue not of compatible type", or something similar. It is still aliasing the same bytes in both cases, just in the `memcpy` case the bytes were moved before they were read by an lvalue expression of the same type in both cases. – M.M Apr 04 '14 at 02:55