2

Is there a standard (or at least safe) way to compare unions for equality in C and/or C++? I expect that bitwise comparison would be useful in a number of scenarios regardless of the last-assigned member in each union; for instance, a particular bit-pattern could be reserved to mean "value is uninitialized", and it would be useful to be able to check if the union is uninitialized without needing to specify an "active" member.

An example in C++ (though I think the concept extends to C using non-member functions):

union MyData
{
  public:
    // Assume I'm compiling this on a platform where the size of `int`
    // exceeds the size of `char*`, or that I'm using some combination
    // if `#ifdef`s or similar to ensure that my numeric type is sufficiently
    // large, or that I include an extra member that is known to be
    // exactly the size of the larger member, just for the sake of
    // comparison.
    int int_member;
    char* ptr_member;

    bool isInitialized() const
    {
      return (*this != INVALID_VAL);
    }

    bool operator==(MyData const& rhs)
    {
      return / * ??? */;
    }

  private:
    constexpr MyData INVALID_VAL { /* ??? */ };
}

// ... later, in client code...

MyData d;
bool initialized{d.isInitialized()};  // false
d.ptr_member = new char[32];
bool initialized{d.isInitialized()};  // true

Here, INVALID_VAL could probably be defined by setting int_member to the max negative int value, because that's an uneven value, so it won't be on a word boundary and therefore is highly unlikely to ever be assigned to the char* member (assuming that assignments typically come directly from new).

One possible implementation of operator== would be simply:

return int_member == rhs.int_member;

Even though it's not known whether int_member is the "active" member, I expect this to be safe, because I see no reason why a static cast from char* to int should fail or be problematic. Is that correct?

If this implementation is unsafe, something like the following should be possible (using C-style casts in C, of course):

return static_cast<void*>(*this) == static_cast<void*>(rhs);

...though of course if MyData is larger than the size of a pointer, you'd have to start messing around with sizeof to make this work.

Does anyone do this? Is the first (simpler) implementation safe? Is there any reason not to do it?

Kyle Strand
  • 15,941
  • 8
  • 72
  • 167

2 Answers2

3

I think a better approach would be to wrap your union inside a class or struct with an enum field storing which was the last member accessed e.g.

class MyData {
    enum {
        unintialized, int_member, ptr_member
    } last_member = unintialized;

    union {
        int int_member;
        char* ptr_member;
    } union_fields;

public:
    bool isInitialized() const
    {
        return last_member != unintialized;
    }
};

The in class initialization of last_member works if you have C++11 otherwise just initialize it in the default constructor.

Create accessors for the two fields and set last_member accordingly, it would also be good to add checks in the accessor methods making sure only the "active member" can be accessed.

ALXGTV
  • 362
  • 3
  • 12
1

Of course it's unsafe.

You can't assume that an int has the same size as a char *, for instance. There might also be padding, which is often random in content.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • That sounds reasonable, but assume that I either compiled this only on a system where I *know* that `int` and `char*` have the same size (which is....most of them? Do you have a quick example of a commonly-used system where this is *not* the case? Some ARM thing, maybe?) OR that I appropriately define my members to avoid this problem (which shouldn't be too hard using some combination of `#ifdef`s and `decltype` with the appropriate headers). – Kyle Strand Apr 17 '15 at 15:21
  • And thanks for editing out the "undefined behavior" complaint (though I wish I could see AlterMann's comment, which they apparently deleted). – Kyle Strand Apr 17 '15 at 15:22
  • Actually, all you need to "fix" the size problem is to ensure that the member you're using for the comparison is at least as big as the other member. So if `char*` happens to be larger on your platform, you can do the comparison against the `char*` member, which should be safe because pointers don't need to be valid in order to be compared. – Kyle Strand Apr 17 '15 at 15:27
  • @KyleStrand int & char* do not have the same size on most systems nowadays. int is usually 32 bits whereas char* is dependent on the word size, most new systems have a processor word size of 64 bits hence pointers would most likely be 64 bits. – ALXGTV Apr 17 '15 at 16:28
  • @ALXGTV Huh--I figured compilers for 64-bit systems would make `int`s and pointers the same size, typically. Interesting. – Kyle Strand Apr 17 '15 at 16:56
  • @KyleStrand - b-it (8051) Microcontrollers have `char*` sizes of 1 byte, 2 bytes, 3 bytes, depending on which memory region you are accessing. You definitely can't assume `sizeof(int) == sizeof(char*)`. – Mark Lakata Oct 25 '17 at 20:44
  • @MarkLakata Looking back, I think my biggest problem in this question was saying `int` instead of `long long`, more specifically and possibly more efficiently, `intptr_t` (with some compile-time check that `sizeof(intptr_t) >= sizeof(int)`). And I don't think either of these answers actually address whether an implementation using a sufficiently large integral type would work. – Kyle Strand Oct 25 '17 at 23:21