3

According to cppreference and Purpose of Unions in C and C++, the code below is UB:

// convert char[8] to uint64_t
uint64_t convert(char c[8]) {
  union{
    uint64_t v;
    char c[8];
  } u;
  for(int i = 0; i < 8; i++) {
    u.c[i] = c[i];
  }
  return u.v;
}

// another example
union U {
  uint64_t v;
  struct{
    uint32_t l;
    uint32_t h;
  }d;
};

uint64_t setlow(uint64_t v, uint32_t l) {
  U u{v};
  u.d.l = l;
  return u.v;
}

However, even though it's UB, this kind of usage gives much convenience and I find it actually works for most compilers(GCC/Clang). So I want to know is there any compiler/implementation in practice that will make the code above break?

konchy
  • 573
  • 5
  • 16
  • 2
    Undefined behaviour is not required to "break". You are relying on behaviour that, even if the compilers you test give results you expect, are not guaranteed. The behaviour may differ with other compilers, or it may not. The behaviour may change when the compiler you rely on is updated. If you intend to rely on such things, you need to be prepared to manage the consequences if something changes in future. In practice, I've noticed that people are quite willing to use such things today when it suits them but are nowhere to be seen if/when the code eventually needs to be corrected. – Peter Aug 19 '22 at 02:01
  • afaik gcc and clang relax this rule and allow aliasing via union members even if it's UB in the C++ standard. – bolov Aug 19 '22 at 02:05
  • Most UB “works”, the problem is if you rely on it to work and it doesn’t, depending on the context of your application it could be a very costly mistake or a shrug of the shoulders. In general, it’s best not to use anything that isn’t defined by the standard. In practice wysiwyg is good enough for a lot of programs. – Taekahn Aug 19 '22 at 02:08
  • 1
    The compiler is allowed to assume UB will never happen. This can be very important for optimizations. For example, the compiler is free to assume that, since `u` is initialized to `v` and then `u.v` is returned (with nothing else assigning to a field directly on `u` in between), that `setlow` is really just `return v`, and it would be well within its rights to optimize that function down to that (and then subsequently inline it to nothing at call sites). This sort of optimization seems "obviously" wrong in this function, but it can happen at a distance in situations much more difficult to spot – Silvio Mayolo Aug 19 '22 at 03:16
  • @bolov gcc explicitly allows type-punning through unions. clang is almost guaranteed to do so as well. MSVC doesn't exploit strict-aliasing at all. This is one of those technically UB but not really things. – Passer By Aug 19 '22 at 04:50
  • @SilvioMayolo yes, I know compilers can do that optimization for it's not against the standard, but will they really do it in practice? I've tested it with latest GCC&Clang, compiled with `-O3` and it always works as expected. – konchy Aug 19 '22 at 06:49

1 Answers1

0

The reason the language hasn’t blessed the compiler extension (based on questionable C wording) is that it doesn’t fit with the rest of the language. Consider:

union U {
    float f;
    int i;
    int operator()();
};
static int putget(float &f,int &i) {
    i=0;
    f=1;
    return i;
}
int U::operator()() {return putget(f,i);}

With the function definition immediately available, GCC and Clang elide the store to i, and yet they return 0 as if that store had not only happened but wasn’t dead.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76