1

I have a data model that is quite large with many members, and many of them are themselves large data models, with nesting like this for several levels deep. The top class represents the overall model that is serialized and sent off to a server for backup. As a debugging step, we would like to deserialize a recent backup and compare it to the in-memory data model at the time of backup, which should be equal. The most obvious way to do this is apply operator== on the current model and its serialized-then-deserialized version.

The problem is that the degree of nesting and quantity of custom data structures will require a tremendous amount of code to write all those operator== implementations. Not to mention that many of those individual implementations will alone be many lines long to compare every member's equality. We're easily talking >1k lines of code just spent on operator==. Even if we do all that, there is large room for programmer error on something like this.

Is there any alternative for a quick and dirty (though reliable) equality check, perhaps using much lower level techniques, or anything that would not require a couple of days of doing nothing but writing operator== functions?

johnbakers
  • 24,158
  • 24
  • 130
  • 258

1 Answers1

2

The tie solution is going to be your best bet.

struct equal_by_tie {
  template<class T>
  using enable = std::enable_if_t<std::is_base_of<equal_by_tie, T>,bool>;
  template<class T>
  friend enable<T>
  operator==( T const& lhs, T const& rhs ) {
    return mytie(lhs) == mytie(rhs);
  }
  template<class T>
  friend enable<T>
  operator!=( T const& lhs, T const& rhs ) {
    return mytie(lhs) != mytie(rhs);
  }
};

Now you have to write

struct some_thing : equal_by_tie {
public:
  friend auto mytie( some_thing const& self ) {
    return std::tie( self.x, self.y, self.mem3 );
  }
};

and == and != are written for you.

There is currently no way to audit if mytie is written correctly, except with some hackery in C++17 that is honestly not worth considering (structured bindings, it is a horrible hack, don't ask).

One way you can reduce the chance that mytie is wrong is to use it more.

Implement swap in terms of it (maybe using the same parent class trick as operator== above). Now implement operator= in terms of swap or mytie. Do the same for friend std::size_t hash(Foo const&) and hook that into your standard hasher.

Insist that mytie be in the same order as your data declarations, and have it tie parent instances as sub-ties. Write a function that takes your system structure/class alignment into account and calculates how big the structure should be in a constexpr. Static assert that the sizes of Foo and calc_struct_size(tag<decltype(mytie(std::declval<Foo&>()))>) match. (Add in fudge factors for vtables or the like as required). Now changing the layout of the struct without touching mytie results in bad things happening.

Compare each pair of fields in mytie for pointer inequality to ensure you don't repeat the same field twice; try to ensure that this optimizes out to true at runtime (tricky, as you'll want to do this check in debug, and debug often has optimizations turned off; maybe this is a unique situation of an assert you want to execute only in release builds!).

You'll also want to do some sanity checks. If your mytie contains raw pointers, == is wrong, and same for smart pointers; you want your == to be a deep equality.

To that end, maybe == is the wrong thing.

struct deep_equal_by_tie {
  template<class T>
  using enable = std::enable_if_t<std::is_base_of<equal_by_tie, T>,bool>;
  template<class T>
  friend enable<T>
  deep_equal( T const& lhs, T const& rhs ) {
    // code to call deep_equal on each tie
    // deep_equal on non-pointer basic types defined as ==
    // deep_equal on pointers is to check for null (nulls are equal)
    // then dereference and deep_equal
    // ditto for smart pointers
    // deep_equal on vectors and other std containers is to check size,
    // and if matches deep_equal on elements
  }
};

this, however, increases your load. But the idea is to increase reliability, as you have noted the hard part is that there is a lot of code and lots of spots to make mistakes.

There is no easy way to do this.

memcmp is a bad idea if your data is anything other than perfectly packed plain old data with no pointers or virtual functions or anything. And it is easy for padding to slip into code, breaking memcmp based equality; such braeks will be hard to find, as the state of data in the padding is undefined.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524