Is this a proper usage of union

Question

I want to have named fields rather than indexed fields, but for some usage I have to iterate on the fields. Dumb simplified example:

struct named_states {float speed; float position;};

#define NSTATES (sizeof(struct named_states)/sizeof(float))
union named_or_indexed_states {
   struct named_states named;
   float indexed[NSTATES];
}
...
union named_or_indexed_states states,derivatives;
states.named.speed = 0;
states.named.position = 0;
...
derivatives.named.speed = acceleration;
derivatives.named.position= states.named.speed;
...
/* This code is in a generic library (consider nstates=NSTATES) */
for(i=0;i<nstates;i++)
    states.indexed[i] += time_step*derivatives.indexed[i];

This avoid a copy from named struct to indexed array and vice-versa, and replace it with a generic solution and is thus easier to maintain (I have very few places to change when I augment the state vector).It also work well with various compiler I tested (several versions of gcc/g++ and MSVC).

But theorically, as I understand it, it does not strictly adhere to proper union usage since I wrote named field then read indexed field, and I'm not sure at all we can say that they share same struct fields...

Can you confirm that's it's theorically bad (non portable)?

Should I better use a cast, a memcpy() or something else?

Apart theory, from pragmatic POV is there any REAL portability issue (some incompatible compiler, exotic struct alignment, planned evolutions...)?

EDIT: your answers deserve a bit more clarification about my intentions that were:

to let programmer focus on domain specific equations and release them from maintenance of conversion functions (I don't know how to write a generic one, apart cast or memcpy tricks which do not seem more robust)
to add a bit more coding security by using struct (fully controlled by compiler) vs arrays (decalaration and access subject to more programmer mistakes)
to avoid polluting namespace too much with enum or #define

I need to know

how portable/dangerous is my steering off the standard (maybe some compiler with aggressive inlining will use full register solution and avoid any memory exchange ruining the trick),
and if I missed a standard solution that address above concerns in part or whole.

There may be padding bytes in the struct that do not get reflected in the array ... — pmg, Aug 14 '12 at 17:19
That's certainly not the right syntax for arrays (`indexed` inside the union). — eq-, Aug 14 '12 at 17:24
@Griwes, this one here is not a duplicate of the one you are linking to. There, the `union` is done with a `char` array to inspect the individual bytes. This always has well defined behavior. Here, things are a bit more subtle since as pmg says, there could in theory be padding between the fields of the `struct`. — Jens Gustedt, Aug 14 '12 at 18:57

score 3 · Accepted Answer · answered Aug 14 '12 at 17:26

3

There's no requirement that the two fields in named_states line up the same way as the array elements. There's a good chance that they do, but you've got a compiler dependency there.

Here's a simple implementation in C++ of what you're trying to do:

struct named_or_indexed_states {
    named_or_indexed_states() : speed(indexed[0], position(indexed[1]) { }
    float &speed;
    float &position;
    float indexed[2];
};

If the size increase because of the reference elements is too much, use accessors:

struct named_or_indexed_states {
    float indexed[2];
    float& speed() { return indexed[0]; }
    float& position() { return indexed[1]; }
};

The compiler will have no problem inlining the accessors, so reading or writing speed() and position() will be just as fast as if they were member data. You still have to write those annoying parentheses, though.

answered Aug 14 '12 at 17:26

Pete Becker

74,985
8
76
165

Yes, I'm aware that compiler may pad the structure members, but I know no architecture where this would give any advantage with just 32/64 bits float/double members. – aka.nice Aug 14 '12 at 19:49
Clever, however if I remove a state, then I have to rewrite the ctor or all accessors successors... Also the dependency on indexed size is hand-crafted. +1 though because it does not pollute namespace and because necessary changes are concentrated in single location. – aka.nice Aug 14 '12 at 19:53
I accept this answer because I like the C++ example, but IMO, this should be auto-generated code. Griwes confirmation of non-standardness were usefull too, and Coder_Dan pure C solution is correct too. For now, I will keep the union because I need to maintain some old C, but I will add tests and documentation. – aka.nice Aug 16 '12 at 17:29

Griwes · Answer 2 · 2012-08-14T17:42:01.527

2

Only accessing last written member of union is well-defined; the code you presented uses, as far as only standard C (or C++) is concerned, undefined behavior - it may work, but it's wrong way to do it. It doesn't really matter that struct uses the same type as the type of array - there may be padding involved, as well as other invisible tricks used by compiler.

Some compilers, like GCC, do define it as allowed way to achieve type-punning. Now the question arises - are we talking about standard C (or C++), or GNU or any other extensions?

As for what you should use - proper conversion operators and/or constructors.

edited Aug 14 '12 at 17:42

answered Aug 14 '12 at 17:21

Griwes

8,805
2
43
70

Do you have a reference for that? Saying it's undefined behaviour? – Luchian Grigore Aug 14 '12 at 17:25
Oh, if you do, feel free to answer - http://stackoverflow.com/questions/11373203/accessing-inactive-union-member-undefined – Luchian Grigore Aug 14 '12 at 17:26
@LuchianGrigore, I think it was already answered in that question - as union has *at most one active member*, it stores *at most one value*, so you cannot read value that isn't there. Standard doesn't seem explicit about that, but also doesn't define what happens when you try to read not-active union member - I think we can call it *undefined*, as it's not *defined* (using definition of word *undefined*). – Griwes Aug 14 '12 at 17:33
It's undefined behavior in the C standard, but all the C compilers I know of do allow `union` for type punning. – ephemient Aug 14 '12 at 17:35
@Griwes I don't know about that. Maybe it's unspecified? What does active mean?... it's not that simple. If it was, I wouldn't have given a bounty on that... – Luchian Grigore Aug 14 '12 at 17:36
@ephemient, there are many undefined behaviors that are allowed and even have consistent behavior on many compilers, that's no proof. – Griwes Aug 14 '12 at 17:36
@LuchianGrigore, I guess that's the whole problem - wording. Should we standardize what "undefined" and "unspecified" mean? Also, C++ standard defines what "active" mean - "[at most one] stored value]". – Griwes Aug 14 '12 at 17:37
@Griwes I mean that common compilers _define_ behavior outside the C specification. For example, [GCC](http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type_002dpunning) documents "type-punning is allowed, provided the memory is accessed through the union type". – ephemient Aug 14 '12 at 17:39
@ephemient, heh. During such discussions, I've always had an impression that only standard definitions are being discussed ;) – Griwes Aug 14 '12 at 17:40
It is not undefined behavior in C. There was wording in C99 that could be interpreted as such, but that was unintentional. It was later corrected in a corrigendum. In C, type puning is one of the major intended uses of `union`s. That said, accessing data through another member *could* lead to UB if the data happens to be a trap representation for that (new) type. – Jens Gustedt Aug 14 '12 at 18:50
@Griwes that's what I understood from other posts. I must think in term of risk/benefits, and wish to be able to evaluate the risk, see my edit. type-punning is compiler dependent and is what I should verify, +1 for naming it. – aka.nice Aug 14 '12 at 20:41
It seems that type punning with union is well defined in C99/C++11 http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified – aka.nice Jan 21 '15 at 14:54
@aka.nice: that question only talks about C, hence C99 and C11, **not** C++. – Griwes Jan 21 '15 at 15:24

score 1 · Answer 3 · answered Aug 14 '12 at 17:29

1

This may be a little old-fashioned, but what I would do in this situation is:

enum {

F_POSITION,

F_SPEED,

F_COUNT };

float states[F_COUNT];

Then you can reference them as: states[F_POSITION] and states[F_SPEED].

That's one way that I might write this. I'm sure that there are many other possibilities.

answered Aug 14 '12 at 17:29

Coder_Dan

1,815
3
23
31

Good. I already used this trick which is better than pre-processor #define. The reason why I did prefer a struct is because it minimize pollution of namespace and ease declaration of states vs float states[F_COUNT]. – aka.nice Aug 14 '12 at 20:01
1

Thanks @aka.nice. My personal opinion is that you could probably get away with using a plain struct containing two floats and cast it to an array of floats. It all depends on which platforms and compilers you might be planning to support in the future. If you decided to do this then I would advocate using 'static_assert' with 'offsetof' to ensure that the floats are positioned in memory where you expect. This is somewhat as per my question here: http://stackoverflow.com/questions/11313534/in-vs2010-is-it-possible-to-use-static-assert-to-verify-an-assumption-about-the. – Coder_Dan Aug 16 '12 at 09:08
Thanks for offsetof, it's usefull – aka.nice Aug 16 '12 at 09:29

Is this a proper usage of union

3 Answers3