Equality comparison of unions?

Question

Is there a standard (or at least safe) way to compare unions for equality in C and/or C++? I expect that bitwise comparison would be useful in a number of scenarios regardless of the last-assigned member in each union; for instance, a particular bit-pattern could be reserved to mean "value is uninitialized", and it would be useful to be able to check if the union is uninitialized without needing to specify an "active" member.

An example in C++ (though I think the concept extends to C using non-member functions):

union MyData
{
  public:
    // Assume I'm compiling this on a platform where the size of `int`
    // exceeds the size of `char*`, or that I'm using some combination
    // if `#ifdef`s or similar to ensure that my numeric type is sufficiently
    // large, or that I include an extra member that is known to be
    // exactly the size of the larger member, just for the sake of
    // comparison.
    int int_member;
    char* ptr_member;

    bool isInitialized() const
    {
      return (*this != INVALID_VAL);
    }

    bool operator==(MyData const& rhs)
    {
      return / * ??? */;
    }

  private:
    constexpr MyData INVALID_VAL { /* ??? */ };
}

// ... later, in client code...

MyData d;
bool initialized{d.isInitialized()};  // false
d.ptr_member = new char[32];
bool initialized{d.isInitialized()};  // true

Here, INVALID_VAL could probably be defined by setting int_member to the max negative int value, because that's an uneven value, so it won't be on a word boundary and therefore is highly unlikely to ever be assigned to the char* member (assuming that assignments typically come directly from new).

One possible implementation of operator== would be simply:

return int_member == rhs.int_member;

Even though it's not known whether int_member is the "active" member, I expect this to be safe, because I see no reason why a static cast from char* to int should fail or be problematic. Is that correct?

If this implementation is unsafe, something like the following should be possible (using C-style casts in C, of course):

return static_cast<void*>(*this) == static_cast<void*>(rhs);

...though of course if MyData is larger than the size of a pointer, you'd have to start messing around with sizeof to make this work.

Does anyone do this? Is the first (simpler) implementation safe? Is there any reason not to do it?

score 3 · Answer 1 · answered Apr 17 '15 at 15:25

3

I think a better approach would be to wrap your union inside a class or struct with an enum field storing which was the last member accessed e.g.

class MyData {
    enum {
        unintialized, int_member, ptr_member
    } last_member = unintialized;

    union {
        int int_member;
        char* ptr_member;
    } union_fields;

public:
    bool isInitialized() const
    {
        return last_member != unintialized;
    }
};

The in class initialization of last_member works if you have C++11 otherwise just initialize it in the default constructor.

Create accessors for the two fields and set last_member accordingly, it would also be good to add checks in the accessor methods making sure only the "active member" can be accessed.

answered Apr 17 '15 at 15:25

ALXGTV

362
3
12

That's probably a preferable approach (especially in the case where the union will be tagged anyway). I'm still curious about bitwise comparisons, though. – Kyle Strand Apr 17 '15 at 15:30
2

Or use `boost::variant`. – Lightness Races in Orbit Apr 17 '15 at 15:45
@LightningRacisinObrit Thanks for the suggestion, I haven't really used Boost that much. – ALXGTV Apr 17 '15 at 16:24
1

@LightningRacisinObrit Thanks, but this is more about learning the language than about solving a particular problem; in my actual code, the union is indeed tagged. – Kyle Strand Apr 17 '15 at 16:57
@KyleStrand: Stack Overflow is specifically for "solving a particular problem", not for "learning the language". For "learning the language" there are [books](http://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). – Lightness Races in Orbit Apr 17 '15 at 17:18
@KyleStrand: Okay you know best. – Lightness Races in Orbit Apr 17 '15 at 17:26
2

@LightningRacisinObrit Actually Stack Overflow is for questions regarding specific localized programming issues (for which neither a google search nor any books have an answer) and not for "solving a particular problem" that is the programmer's job. – ALXGTV Apr 18 '15 at 14:04
@ALXGTV: Same thing, to a certain level of accuracy :P You are right, of course. – Lightness Races in Orbit Apr 18 '15 at 15:13

unwind · Answer 2 · 2015-04-17T15:09:12.070

1

Of course it's unsafe.

You can't assume that an int has the same size as a char *, for instance. There might also be padding, which is often random in content.

edited Apr 17 '15 at 15:09

answered Apr 17 '15 at 15:01

unwind

391,730
64
469
606

That sounds reasonable, but assume that I either compiled this only on a system where I *know* that `int` and `char*` have the same size (which is....most of them? Do you have a quick example of a commonly-used system where this is *not* the case? Some ARM thing, maybe?) OR that I appropriately define my members to avoid this problem (which shouldn't be too hard using some combination of `#ifdef`s and `decltype` with the appropriate headers). – Kyle Strand Apr 17 '15 at 15:21
And thanks for editing out the "undefined behavior" complaint (though I wish I could see AlterMann's comment, which they apparently deleted). – Kyle Strand Apr 17 '15 at 15:22
Actually, all you need to "fix" the size problem is to ensure that the member you're using for the comparison is at least as big as the other member. So if `char*` happens to be larger on your platform, you can do the comparison against the `char*` member, which should be safe because pointers don't need to be valid in order to be compared. – Kyle Strand Apr 17 '15 at 15:27
@KyleStrand int & char* do not have the same size on most systems nowadays. int is usually 32 bits whereas char* is dependent on the word size, most new systems have a processor word size of 64 bits hence pointers would most likely be 64 bits. – ALXGTV Apr 17 '15 at 16:28
@ALXGTV Huh--I figured compilers for 64-bit systems would make `int`s and pointers the same size, typically. Interesting. – Kyle Strand Apr 17 '15 at 16:56
@KyleStrand - b-it (8051) Microcontrollers have `char*` sizes of 1 byte, 2 bytes, 3 bytes, depending on which memory region you are accessing. You definitely can't assume `sizeof(int) == sizeof(char*)`. – Mark Lakata Oct 25 '17 at 20:44
@MarkLakata Looking back, I think my biggest problem in this question was saying `int` instead of `long long`, more specifically and possibly more efficiently, `intptr_t` (with some compile-time check that `sizeof(intptr_t) >= sizeof(int)`). And I don't think either of these answers actually address whether an implementation using a sufficiently large integral type would work. – Kyle Strand Oct 25 '17 at 23:21

Equality comparison of unions?

2 Answers2