Is it undefined behaviour to read a different member than was written in a Union?

Question

union test{
  char a; // 1 byte
  int b;  // 4 bytes
};

int main(){ 
  test t;
  t.a = 5;
  return t.b;
}

This link says: https://en.cppreference.com/w/cpp/language/union

It's undefined behavior to read from the member of the union that wasn't most recently written.

According to this, does my sample code above have UB? If so then what's the point of a Union then? I thought the whole point it to read/write different value types form the same memory location.

If I need to access the most recently written value then I will just use a regular variable and not a Union.

the initial motivation for unions was to save memory when you only ever need one of the members at a time afaik. It was in the old days when you didnt have loads of memory. — 463035818_is_not_an_ai, Jun 09 '21 at 13:06
A real life use-case for unions: we have an API that allows to query about a number of different parameters. These parameters can have different types (including classes). Because of this, we return a wrapper struct that contains an enum denoting active member and a union of possible parameter types, filled with correct data. — Yksisarvinen, Jun 09 '21 at 13:08
_If so then what's the point of a Union then?_ Employing union does not imply the need for reading its inactive members. For instance, see the implementation of `std::basic_string` in libstdc++, where a buffer for short strings is aliased with the capacity member variable. Since you can detect short/long strings by comparing the data pointer with the address of this buffer, there is no need for accessing an inactive member and union is still useful. Code: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/basic_string.h#L178. — Daniel Langr, Jun 09 '21 at 13:20
Does this answer your question? [Unions and type-punning](https://stackoverflow.com/questions/25664848/unions-and-type-punning) — Daniel Langr, Jun 09 '21 at 13:24
Also relevant: [Unions, aliasing and type-punning in practice: what works and what does not?](https://stackoverflow.com/q/54762186/580083). — Daniel Langr, Jun 09 '21 at 13:27
@463035818_is_not_a_number It is definitely not just some _old-days-issue_. Unions are still used, e.g., in implementations of string classes in all mainstream libraries (libstdc++, libc++, Microsoft STL). You don't want string objects to be unnecessarily large, since there may be a lot of them in memory at once. — Daniel Langr, Jun 09 '21 at 13:36
An alternative is to use a smart union, [`std::variant`](https://en.cppreference.com/w/cpp/utility/variant). — Eljay, Jun 09 '21 at 14:02
Note that the comment in `int b; // 4 bytes` is often correct, but the C++ standard does not require it. `int` has to be at least 2 bytes, and it has to be at least as large as `short` (which also has to be at least 2 bytes). — Pete Becker, Jun 09 '21 at 15:06

Bathsheba · Accepted Answer · 2021-06-09T13:09:03.597

8

Yes the behaviour is undefined in C++.

When you write a value to a member of union, think of that member becoming the active member.

The behaviour of reading any member of a union that is not the active member is undefined.

in C++, a union is often coupled with another variable that serves as a means of identifying the active member.

edited Jun 09 '21 at 13:09

answered Jun 09 '21 at 13:05

Bathsheba

231,907
34
361
483

Importantly UB does not mean "crashes". Doing what OP thought it *should* until one day not is validly UB. – Caleth Jun 09 '21 at 13:07
Does does apply in C too? or only C++? – Dan Jun 09 '21 at 13:08
C++ is stricter. In C you can indeed type pun through a `union`. – Bathsheba Jun 09 '21 at 13:08
I see, so this is valid in C but not C++, is there a link which mentions it's valid in C? – Dan Jun 09 '21 at 13:09
See https://stackoverflow.com/questions/25664848/unions-and-type-punning – Bathsheba Jun 09 '21 at 13:09
Thanks, this makes Unions, well, mostly pointless. – Dan Jun 09 '21 at 13:11
1

It might be worth adding that some implementations (maybe most) support reading an inactive union member as a non-standard extension. (IIRC, for example, SSO implementation in libc++ relies on this.) – Daniel Langr Jun 09 '21 at 13:14
@Dan - Unions are not pointless in embedded systems where memory space may be severely restricted, which is why they exist. In your example the union is 4 bytes, whereas the two members separately would require 5. – ChrisBD Jun 09 '21 at 13:26
1

@ChrisBD Two members separately would require 5, but putting them into `struct` or `class` would require 8 due to padding/alignment. – Daniel Langr Jun 09 '21 at 13:29
@ChrisBD Then you can use `bit fields` instead for saving memory. – Dan Jun 09 '21 at 13:29
@DanielLangr - correct, which is why I didn't mention structs – ChrisBD Jun 09 '21 at 13:43
@Dan - I don't envy anyone trying to use bitfields instead of a union when handling floating point numbers. That's the beauty of unions they can "contain" anything. – ChrisBD Jun 10 '21 at 12:22

Daniel Langr · Answer 2 · 2021-06-09T14:09:54.533

Your implication that having unions without the possibility of reading their inactive members makes them useless is wrong. Consider the following simplified implementation of a string class:

class string {
  char* data_;
  size_t size_;
  union {
    size_t capacity_;
    char buffer_[16];
  };

  string(const char* str) : size_(strlen(str)) {
    if (size_ < 16) 
      data_ = buffer_;  // short string, buffer_ will be active
    else {
      capacity_ = size_;  // long string, capacity_ is active
      data_ = new char[capacity_ + 1];
    }
    memcpy(data_, str, size_ + 1);      
  }

  bool is_short() const { return data_ == buffer_; }
  ...
public:
  size_t capacity() const { return is_short() ? 15 : capacity_; }
  const char* data() const { return data_; }
  ...
};

Here, if the stored string has less then 16 characters, it is stored in buffer_ and data_ points to it. Otherwise, data_ points to a dynamically-allocated buffer.

Consequently, you can distinguish between both cases (short/long string) by comparing data_ with buffer_. When the string is short, buffer_ is active and you don't need to read capacity_, since you know it is 15. When the string is long, capacity_ is active and you don't need to read buffer_, since it is unused.

Exactly this approach is used in libstdc++. It is a bit more complicated there since std::string is just a specialization of std::basic_string class template, but the idea is the same. Source code from include/bits/basic_string.h:

enum { _S_local_capacity = 15 / sizeof(_CharT) };

union
{
  _CharT    _M_local_buf[_S_local_capacity + 1];
  size_type _M_allocated_capacity;
};

It can save a lot of space if your program works with a lot of strings at once (consider, e.g., databases). Without union, each string objects would take 8 more bytes in memory.

Is it undefined behaviour to read a different member than was written in a Union?

2 Answers2

Linked