0

I'm reading some code

I'm reading a tutorial, located here tutorial that says that

union lets us treat the data as either separate fields or a single byte array.

Could somebody explain to me how a union lets you treat the data as a single byte array. My understanding of unions is that they enable you to store different data types in the same memory location but only one member can contain a value at any given time.

 union ethframe
  {
    struct
    {
      struct ethhdr    header;
      unsigned char    data[1500];
    } field;
    unsigned char    buffer[1514];
  };
timrau
  • 22,578
  • 4
  • 51
  • 64
  • a byte array like that `buffer` one you have there? – Chris Turner Dec 22 '17 at 15:59
  • Possible duplicate of [union initialization in c](https://stackoverflow.com/questions/9824641/union-initialization-in-c) – PRATEEK BHARDWAJ Dec 22 '17 at 16:02
  • Any type can be aliased as `char` array and accessed as one. But once you modify it using such an alias the behavior is undefined as you violating the *strict aliasing rule*. – Eugene Sh. Dec 22 '17 at 16:06
  • 1
    @EugeneSh: Does strict aliassing apply when you use `char` to access? – Jonathan Leffler Dec 22 '17 at 16:28
  • OT: That magic number 1514 is dangerous here - it may be incorrect due to misunderstanding `header` and padding. `struct field_s { struct ethhdr header; unsigned char data[1500]; }; union ethframe { struct field_s field; unsigned char buffer[sizeof (struct field_s)]; };` is better. – chux - Reinstate Monica Dec 22 '17 at 16:33
  • @JonathanLeffler When you use to *modify*, I think it does. How can you guarantee you won't make it some trap value? – Eugene Sh. Dec 22 '17 at 16:40
  • @EugeneSh. I do not believe Strict Aliasing comes into play with unions. See this answer: https://stackoverflow.com/a/11640603/8513665 – Christian Gibbons Dec 22 '17 at 16:42
  • @ChristianGibbons I am aware of of this change, and the answer you are pointing to is stating *An unspecified value that could be a trap is read when the union members are of different size.* - so yeah, the behavior would not be *undefined* but *unspecified*. – Eugene Sh. Dec 22 '17 at 16:48

2 Answers2

1

Don't know where you got the idea that only one member can contain a value at any given time, but it's wrong. Take this example:

union example
    {
    int a;
    char b[4];
    };

If you assign a value to a, if you examine the contents of b you'll find that each of the 4 array elements corresponds to the 4 bytes of the int that is a. If you change one of them, you'll indirectly change the value of a.

Chris Turner
  • 8,082
  • 1
  • 14
  • 18
  • I believe the assertion about the only one member is related to the strict aliasing rule. That if it was initialized using some specific member access, it cannot be accessed using the others (unless these are `char` array). – Eugene Sh. Dec 22 '17 at 16:08
  • @Chris Turner, thanks for your helpful response - I got it from this tutorial (which I thought was reliable) [link](https://www.tutorialspoint.com/cprogramming/c_unions.htm) – ariane17 holland Dec 22 '17 at 16:10
  • 1
    I think what the tutorial was trying to explain is that in some cases, accessing a union member that is a different type to the one you've assigned to will give you unexpected results. Like if you add `float c;` to my example and assign 20.0 to it, you won't get 20 from accessing `a`. – Chris Turner Dec 22 '17 at 16:20
  • The idea that only one member of a union can contain a value at any given time is not wrong. It is explicitly stated in the C standard. C 2011 [N1570] 6.7.2.1 16 says “The value of at most one of the members can be stored in a union object at any time.” Reading characters through a member other than the last one used to store a value is technically reinterpreting the bytes representing the value, which is called type punning. The standard defines these in technical ways, which can affect behavior, so it is not generally correct to consider the union as holding multiple values. – Eric Postpischil Dec 22 '17 at 16:36
0

The normal use of a union is to store (and retrieve) only one value at a time.

C is somewhat of a medium-level language. It both supports the use of types with various features (integer, floating-point, pointers, arrays, structures, unions, bit fields, combinations of these, and so on) but also allows access to the bytes that represent types.

In C, you can convert a pointer to an object to a pointer to a character type and use that pointer to inspect the bytes of an object.

You are also allowed to store a value to one member of a union and then read the contents using another member. When you do this, the bytes in the union will be reinterpreted as if they represented a value in the type of the member being used.

Serious problems can arise with your program if you do this incorrectly. Inspecting or reinterpreting the bytes of an object should be done only for special purposes. For example, special-purpose math library code targeted for specific hardware may need to manipulate the bytes of floating-point objects. Some code for input/output may need to package objects as streams of bytes that are communicated to other systems or may need to receive streams of bytes and reinterpret them as other objects.

For normal use of a union, you should only read the last member written.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312