21

Scenario

I have a class which I want to be able to compare for equality. The class is large (it contains a bitmap image) and I will be comparing it multiple times, so for efficiency I'm hashing the data and only doing a full equality check if the hashes match. Furthermore, I will only be comparing a small subset of my objects, so I'm only calculating the hash the first time an equality check is done, then using the stored value for subsequent calls.

Example

class Foo
{
public:

   Foo(int data) : fooData(data), notHashed(true) {}

private:

   void calculateHash()
   {
      hash = 0; // Replace with hashing algorithm
      notHashed = false;
   }

   int getHash()
   {
      if (notHashed) calculateHash();
      return hash;
   }

   inline friend bool operator==(Foo& lhs, Foo& rhs)
   {
      if (lhs.getHash() == rhs.getHash())
      {
         return (lhs.fooData == rhs.fooData);
      }
      else return false;
   }

   int fooData;
   int hash;
   bool notHashed;
};

Background

According to the guidance on this answer, the canonical form of the equality operator is:

inline bool operator==(const X& lhs, const X& rhs);

Furthermore, the following general advice for operator overloading is given:

Always stick to the operator’s well-known semantics.

Questions

  1. My function must be able to mutate it's operands in order to perform the hashing, so I have had to make them non-const. Are there any potential negative consequences of this (examples might be standard library functions or STL containers which will expect operator== to have const operands)?

  2. Should a mutating operator== function be considered contrary to its well-known semantics, if the mutation doesn't have any observable effects (because there's no way for the user to see the contents of the hash)?

  3. If the answer to either of the above is "yes", then what would be a more appropriate approach?

Community
  • 1
  • 1
JBentley
  • 6,099
  • 5
  • 37
  • 72
  • 9
    I suspect that 99% of all X's for which a stack overflow question exists of the form "is X a bad practice?" are bad practice. The people asking the question have done X in their code and are looking for moral support for just leaving it there. The other 1% have come up with some brilliant new practice and are just showing off. – Kaz Apr 12 '13 at 17:33
  • Why do you have a `friend` function inside the class declaration? Why use an explicit `inline` for a function which is defined in a class declaration? The `==` operator can just be a member. The left hand side is implicit (the `this` object). `bool operator == (const Foo &rhs) { ... }`. – Kaz Apr 12 '13 at 20:40
  • What if the `getHash` function was actually on a member object, or base class? Say you derive from some `widget` class which has a `gethash` function. Do you care that `gethash` performs lazy instantiation, and so is destructive, as long as it works? – Kaz Apr 12 '13 at 20:42
  • This question sorta gets why the google c++ style guide discourages operator overloading. http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml?showone=Operator_Overloading#Operator_Overloading – jlund3 Apr 12 '13 at 20:45
  • @Kaz If you declare `==` as a member, that forces one of the operands to be a `this` pointer (which may not always be what you want). Most of the guides I have read recommend implementing it (and other comparison operators) as a non-member friend function for that reason. – JBentley Apr 12 '13 at 20:57
  • @Kaz My use of 'inline' and 'friend' inside a class declaration may be a misunderstanding, that is just how I saw it implemented in an example I found. It was my belief that a `friend` function inside a class declaration is actually a non-member. See [this](http://stackoverflow.com/questions/7785886/access-friend-function-defined-in-class) for example. I think I added `inline` due to a mistaken belief that only member functions have it implicitly, but I see now that I may be wrong, and that it applies to anything within a class declaration. – JBentley Apr 12 '13 at 20:58
  • If the left operand is a `Foo`, why is it an advantage to use a non-member function, if that function has to be made a `friend` of the class to gain access? The advantage of nonmember functions is that they use the class public interface (are not friends). You can write them without changing the class declaration in any way (e.g. for third party class libraries). A function which a friend only of class `Foo` and which operates on a left operand of type `Foo` passed in by reference screams "I am a member function in disguise". – Kaz Apr 12 '13 at 21:04
  • @Kaz What if the left operand is *not* a Foo? – JBentley Apr 12 '13 at 21:05
  • If the left operand isn't a `Foo`, then C++ says it's can't be a (non-static) member function of class `Foo`. If the function needs access to to the class `Foo` bits of something, then it can be a static member of the `Foo` class. If it needs access to several private parts of different classes, then it can be a nonmember friend of all of those classes. If it needs no special access inside any class, then it can just be a plain nonmember function that nobody declares as a friend. – Kaz Apr 12 '13 at 21:10
  • @Kaz How would you call `operator==` if it were defined as a static member? Is it even possible? I can't get it to compile in MSVC (even if I fully qualify it), and [this question](http://stackoverflow.com/questions/11894124/why-overloaded-operators-cannot-be-defined-as-static-members-of-a-class) suggests not. – JBentley Apr 12 '13 at 21:21
  • Ah, no. Operators cannot be static member functions. If something like that was definable, then it would be easy to call from within the class, but from the outside it look like `Foo::operator == (a, b)`. – Kaz Apr 12 '13 at 21:25
  • @Kaz Yes, that is the conclusion I have reached when my compiler complained that `'Foo::operator ==' must be a non-static member` when I tried to call `Foo::operator==(a, b)` (which is unpleasant syntax in any case). [This answer](http://stackoverflow.com/a/11894935/1227469) claims that it is possible as a `static friend` function, but when I try that, my compiler complains instead that `'==' : is not a member of 'Foo`, so I'm not sure what that answer is talking about. – JBentley Apr 12 '13 at 21:34
  • 1
    I added a comment to that answer. – Kaz Apr 12 '13 at 21:41
  • A `static friend` is no longer a member function, which is why it works. The `static` has that meaning which it has for nonmember functions. – Kaz Apr 12 '13 at 22:01
  • @Kaz Ah yes, so it does, I forgot to revert to `(a == b)` instead of `Foo::operator==(a, b)` for the `static friend` version. – JBentley Apr 12 '13 at 23:05

6 Answers6

37

It seems like a perfectly valid usecase for a mutable member. You can (and should) still make your operator== take the parameters by const reference and give the class a mutable member for the hash value.

Your class would then have a getter for the hash value that is itself marked as a const method and that lazy-evaluates the hash value when called for the first time. It's actually a good example of why mutable was added to the language as it does not change the object from a user's perspective, it's only an implementation detail for caching the value of a costly operation internally.

Daniel Frey
  • 55,810
  • 13
  • 122
  • 180
  • 9
    Because it hasn't been mentioned — if you are planning to use `mutable` to perform caching, it would be advisable to do so in a thread-safe manner, because otherwise the fact that the `operator==` modifies the `Foo` could be observed by calling it on the same object from two threads simultaneously. Even if you never do this explicitly in your code, [it is the opinion of at least some C++ committee members](http://isocpp.org/blog/2012/12/you-dont-know-const-and-mutable-herb-sutter) that it would be legal for the standard library to do this. – Mankarse Apr 12 '13 at 16:03
  • 5
    note that you can make it thread safe by using 0 as a sentinal hash and doing `if(atomic_load(&hash)!=0)atomic_store(&hash,calc_hash());` – ratchet freak Apr 12 '13 at 16:40
12

Use mutable for the data that you want to cache but which does not affect the public interface.

U now, “mutate” → mutable.

Then think in terms of logical const-ness, what guarantees the object offers to the using code.

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
3

You should never modify the object on comparison. However, this function does not logically modify the object. Simple solution: make hash mutable, as computing the hash is a form of cashing. See: Does the 'mutable' keyword have any purpose other than allowing the variable to be modified by a const function?

Community
  • 1
  • 1
ya23
  • 14,226
  • 9
  • 46
  • 43
1
  1. Having side effect in the comparison function or operator is not recommended. It will be better if you can manage to compute the hash as part of the initialization of the class. Another option is to have a manager class that is responsible for that. Note: that even what seems as innocent mutation will require locking in multithreaded application.
  2. Also I will recommend to avoid using the equality operator for classes where the data structure is not absolutely trivial. Very often the progress of the project creates a need for comparison policy (arguments) and the interface of the equality operator becomes insufficient. In this case adding compare method or functor will not need to reflect the standard operator== interface for immutability of the arguments.
  3. If 1. and 2. seem overkill for your case you could use the c++ keyword mutable for the hash value member. This will allow you to modify it even from a const class method or const declared variable
gsf
  • 6,612
  • 7
  • 35
  • 64
1

Yes, introducing semantically unexpected side-effects is always a bad idea. Apart from the other reasons mentioned: always assume that any code you write will forever only be used by other people who haven't even heard of your name, and then consider your design choices from that angle.

When someone using your code library finds out his application is slow, and tries to optimize it, he will waste ages trying to find the performance leak if it is inside an == overload, since he doesn't expect it, from a semantic point of view, to do more than a simple object comparison. Hiding potentially costly operations within semantically cheap operations is a bad form of code obfuscation.

Niels Keurentjes
  • 41,402
  • 9
  • 98
  • 136
  • Interesting point, and the same one mentioned in the [Google style guide](http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml?showone=Operator_Overloading#Operator_Overloading) linked to in one of the other comments. The hashing is designed to *reduce* the cost of the `==` operation in most cases, but you're right in that there will be cases where that is not the case e.g. where the user never compares the object more than once. So you're suggesting I should prefer a named equality member function (e.g. `bool equals(Foo& rhs)`), and documenting the performance semantics there? – JBentley Apr 13 '13 at 00:38
  • Yes, that would be far cleaner, and let the == operator just do the plain, potentially expensive plain comparison. The coder will know that the operation is computationally expensive if he puts a 4GB binary in it. – Niels Keurentjes Apr 13 '13 at 00:45
  • Huh? So first you're saying `==` should be computionally cheap because it's semantically cheap, now you're saying it should not be optimised. — I supposed what you mean is, the computation cost of `==` should be _predictable_, but as long as you have a fixed and reasonable worst-case bound I don't see what's wrong with performance variations. There are many operations which may take different time depending on hard-to-predict circumstances, for instance `std::vector::push_back` is very fast on avarage but sometimes needs to relocate the entire array. – leftaroundabout Apr 13 '13 at 11:00
  • Yeah the wording wasn't as clean as it could be hehe. I meant indeed the computational cost of a comparison should have a predictable linear relationship at worst to the size of its operands. Introducing side-effects could make it exponential. – Niels Keurentjes Apr 13 '13 at 11:05
  • Why would it become exponential? – leftaroundabout Apr 13 '13 at 11:06
  • If you are hashing the operands before the comparison it could very well be, or introducing other side-effects such as (down)load-on-demand or anything. – Niels Keurentjes Apr 13 '13 at 11:08
  • Only in a horribly bad implementation of hashing or downloading would take worse than linear time, but anyway. – leftaroundabout Apr 13 '13 at 11:34
  • "Could" was the keyword in my remark, anything that "could" cause a performance issue should not be hidden away behind syntax that doesn't hit at the possibility. – Niels Keurentjes Apr 13 '13 at 12:16
0

You can go the mutable route, but I'm not sure if that is needed. You can do a local cache when needed without having to use mutable. For example:

#include <iostream>
#include <functional> //for hash

using namespace std;

template<typename ReturnType>
class HashCompare{
public:
    ReturnType getHash()const{
        static bool isHashed = false;
        static ReturnType cachedHashValue = ReturnType();
        if(!isHashed){
            isHashed = true;
            cachedHashValue = calculate();
        }
        return cachedHashValue;
    }
protected:
    //derived class should implement this but use this.getHash()
    virtual ReturnType calculate()const = 0;
};



class ReadOnlyString: public HashCompare<size_t>{
private:
    const std::string& s;
public:
    ReadOnlyString(const char * s):s(s){};
    ReadOnlyString(const std::string& s): s(s){}

    bool equals(const ReadOnlyString& str)const{
        return getHash() == str.getHash();
    }
protected:
    size_t calculate()const{
        std::cout << "in hash calculate " << endl;
        std::hash<std::string> str_hash;
        return str_hash(this->s);
    }
};

bool operator==(const ReadOnlyString& lhs, const ReadOnlyString& rhs){ return lhs.equals(rhs); }


int main(){
    ReadOnlyString str = "test";
    ReadOnlyString str2 = "TEST";
    cout << (str == str2) << endl;
    cout << (str == str2) << endl;
}

Output:

 in hash calculate 
 1
 1

Can you give me a good reason to keep as to why keeping isHashed as a member variable is necessary instead of making it local to where its needed? Note that we can further get away from 'static' usage if we really want, all we have todo is make a dedicated structure/class

dchhetri
  • 6,926
  • 4
  • 43
  • 56
  • You mean keep a `HashCompare` alongside every `Foo`, and then check both locally each time or have a free function to do that? That breaks encapsulation (isHashed is a property of a `Foo`, so why does it live outside `Foo`?) and means I have to pass them around as pairs wherever they're needed. Whats the benefit? Please clarify if I've misunderstood your suggestion. – JBentley Apr 12 '13 at 23:11