14

I have structure like below.

struct result{
    int a;
    int b;
    int c;
    int d;
}

and union like below.

union convert{
   int arr[4];
   struct result res;
}

and I type pun as below.

  int arr1[4] = {1,2,3,5};
  union convert *pointer = (union convert *) arr1; // Here is my question, is it well defined?

  printf("%d %d\n", pointer->res.a, pointer->res.b);
AndyG
  • 39,700
  • 8
  • 109
  • 143
KBlr
  • 312
  • 1
  • 11

4 Answers4

11

pointer->res.a is fine but the behaviour of pointer->res.b is undefined.

There could be an arbitrary amount of padding between the a and b members.

Some compilers allow you to specify that there is no padding between members but of course then you are giving up portability.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • Thank you! If I ensure that there will be no padding is it then well defined? – KBlr Jan 17 '19 at 14:36
  • 1
    @KBlr: Yep! Just not portable – AndyG Jan 17 '19 at 14:37
  • 3
    `pointer->res.a` accesses an object (`arr1`) through an lvalue (`pointer->res`) in violation of C 2018 6.5 7. – Eric Postpischil Jan 17 '19 at 16:13
  • @EricPostpischil one could say that the `arr1[0]` (because `arr1` is converted to a pointer to its first member before the cast) is accessed through an lvalue `pointer->res.a`, which does not violate 6.5.7, because the type of the lvalue is `int` and the accessed object's type is `int`. – Language Lawyer Jan 17 '19 at 16:35
  • @LanguageLawyer: The fact that the expression also accesses objects through permitted lvalues does not cure the violation of the rule. If a rule is broken, it is broken; that cannot be fixed by doing something else correctly. – Eric Postpischil Jan 17 '19 at 18:18
  • 1
    @EricPostpischil So the aliasing rule applies not only to the "final" expression which is actually used to access (read or modify) an object, but to all glvalue subexpressions of the "final" expression? – Language Lawyer Jan 17 '19 at 18:22
  • @LanguageLawyer: A subexpression is an expression, and 6.5 7 does not exclude them. Consider the purpose of restricting aliasing: A routine might be passed pointers to (the first elements of) arrays of different structures with identical definitions. The aliasing rules enable compilers to optimization assuming the pointers do not alias. If the “final” expression could alias, it would break optimization. – Eric Postpischil Jan 17 '19 at 18:41
  • 1
    @EricPostpischil What I don't like in this aliasing rules, they talk about "expression used to access". I'm not sure does it mean only the "final" expression or all the subexpressions are also "used to access". Any authoritative references here? Also, _arrays of different structures with identical definitions_ did not understand this. – Language Lawyer Jan 17 '19 at 18:59
  • 1
    @LanguageLawyer: `struct A { int x; };` and `struct B { int x; };` have identical definitions but are different types. The C standard says they are not compatible, and one may not alias the other. If only the `int x` mattered, the aliasing rule would be useless. It is the fact that one structure cannot alias the other that enables optimization based on the aliasing rules. The aliasing rules must apply to the structures lvalues, not just the `int`. – Eric Postpischil Jan 17 '19 at 20:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186883/discussion-between-language-lawyer-and-eric-postpischil). – Language Lawyer Jan 17 '19 at 20:17
  • @EricPostpischil _A subexpression is an expression, and 6.5 7 does not exclude them_ Ok, then if `p` is a variable with the type pointer to `int` (`int*`) (and actually points to an `int` object), accessing this `int` object through `p` as `*p` is UB because the subexpression `p` is an lvalue of type `int*`, and 6.5/7 does not allow accessing `int` using lvalue of type `int*`? LMAO. – Language Lawyer Feb 02 '19 at 03:28
  • 1
    @LanguageLawyer: In `*p`, the lvalue `p` has type `int *` and is used to access an object that has type `int *`. In `pointer->res.a`, `pointer->res` has type `struct result` and is used to access an object that has type `int [4]`. The rule about accessing an object through a lvalue expression of appropriate type means the lvalue expression type must be appropriate for its object type, not that an expression containing an lvalue somewhere in it can only access objects of that type throughout the entire expression. If you have further questions, you ought to pose a question, not ask in comments. – Eric Postpischil Feb 02 '19 at 13:14
  • @EricPostpischil http://port70.net/~nsz/c/c11/n1570.html#3.1p1 "**access**: to read or modify the value of an object". The `pointer->res` lvalue subexpression is not used to read or modify the object. The `pointer->res.a` lvalue is used to access (read) an object. – Language Lawyer Feb 02 '19 at 13:27
  • @LanguageLawyer: If you have further questions, you ought to pose a question, not ask in comments. – Eric Postpischil Feb 02 '19 at 13:50
6

Is this type punning well defined?

struct result{
    int a,b,c,d;
}

union convert {
   int arr[4];
   struct result res;
}

int arr1[4] = {1,2,3,5};
union convert *pointer = (union convert *) arr1; 

(union convert *) arr1 risks alignment failure.

A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. C11dr §6.3.2.3 8

There is no requirement that union convert and int share the same alignment. union convert requirements may exceed int for example.

Consider this possibility: arr1[] lives on int street where all addresses are multiple of 4. union and struct friends lives on "multiple of 8" street. arr1[] might have address 0x1004 (not a multiple of 8).

In 2019, alignment failures are more commonly seen with char (needing 1) and other types needing 2 or more. In OP's select case, I doubt a real platform will have alignment issues, yet incorrect alignment remains possible.

This type punning is not well defined.


Additional concerns

Other answers and comments discuss padding issues, which further identifies trouble.

@Eric Postpischil comment about improper access with pointer->res.a adds more reasons to consider this UB.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • how is it possible for `union convert` to have different padding? If `convert ` and `result` hold only `int`s and `arr1` is an array of `int`, doesn't it mean they all will have the same padding no matter what? – Hrisip Nov 26 '20 at 13:31
  • @Hrisip C does not require 2 `int` members of `struct result` to be adjacent (no padding). An `int` array's elements are without intervening padding. It is this difference, not the padding between `struct result` and `union convert` that is the pedantic concern. – chux - Reinstate Monica Nov 26 '20 at 13:48
  • Why would there be something in between the members? It seems insane – Hrisip Nov 26 '20 at 16:00
  • @Hrisip You asked how it is possible and my comment answered that. Now you pose a new question. Rather than a running discussion here on the possibilities and sanity of a C implementation, consider posting a new question with all your concerns. – chux - Reinstate Monica Nov 26 '20 at 16:14
2

C imposes no rule about how much padding is left between 2 consecutive members of a structure.

This is why the implementations define many #pragma directives -- specially to change this behaviour.

So, as the answer of Bathsheba says, ...->b is undefined.

I answered the very same question some time ago, here.

alinsoar
  • 15,386
  • 4
  • 57
  • 74
1

Pointer punning is not safe. Use real union punning instead.

Assumptions: the struct is properly packed (no padding between the members)

#include <stdio.h>
#include <string.h>



struct __attribute__((packed)) result{
    int a;
    int b;
    int c;
    int d;
};

union convert{
   int arr[4];
   struct result res;
};

  volatile int arr1[4];

void foo(void)
{

  union convert cnv;

  memcpy(&cnv, (void *)arr1, sizeof(arr1));

  printf("%d %d\n", cnv.res.a, cnv.res.b);
}

all modern compilers will optimize out the memcpy call

https://godbolt.org/z/4qtRIF

.LC0:
        .string "%d %d\n"
foo:
        mov     rsi, QWORD PTR arr1[rip]
        xor     eax, eax
        mov     rdi, QWORD PTR arr1[rip+8]
        mov     edi, OFFSET FLAT:.LC0
        mov     rdx, rsi
        sar     rdx, 32
        jmp     printf
0___________
  • 60,014
  • 4
  • 34
  • 74