Is it a bad idea creating an object in a GetHashCode function?

Question

I have a base class MyBase and about a dozen of derived classes. I rely on the visitor pattern heavily in my code. So the base class is an abstract host for a visitor, each of the derived classes is a concrete host. I am using a standalone comparer that implements the IEqualityComparer<MyBase> interface. There are 2 methods in this interface: Boolean Equals(MyBase a, MyBase b) and Int32 GetHashCode(MyBase obj). In each of these methods I use a visitor in order to resolve an instance of MyBase to the instance of one of the derived types. This is how I avoid dealing with casting. So the visitor is an object that needs to be created on each call to Equals and GetHashCode. I've read a lot about keeping GetHashCode as cheap as possible, so the question would be:

Is creating an object (a visitor) in the GetHashCode and Equals method a bad idea considering the performance?

**Is creating an object in the GetHashCode method a bad idea considering the performance?** How on earth should we know? **You're the one who knows how fast it is and you're the one who knows whether that speed is acceptable to your customers or not**. If you don't know the speed, **measure it**. If you don't know what's acceptable to your customers, **ask them**. Neither of these questions can be answered by asking strangers on the internet. — Eric Lippert, Jun 29 '13 at 02:52
@EricLippert, there are a lot of caveats in C# like calling a virtual method from a constructor which are not prohibited but there is a good chance you shoot yourself in a foot. After reading MSDN I got an impression there is something special going on about GetHashCode. It must not throw exceptions (why?). It must be inexpensive to call(why?) What is considered inexpensive? Is creating an object a inexpensive thing to do? Is there anything else I am supposed to know about circumstances in which GetHashCode operates? How am I supposed to know about them besides asking a question? — Trident D'Gao, Jun 29 '13 at 03:38
Well that's an entirely different question you're asking now. You're supposed to know about them by doing a little basic research. Typing "GetHashCode" into a search engine would for example lead you to my article on the subject, which answers your questions. http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethashcode.aspx — Eric Lippert, Jun 29 '13 at 06:00
Alternatively you could search StackOverflow for similar questions, like this one: http://stackoverflow.com/questions/462451/gethashcode-guidelines-in-c-sharp — Eric Lippert, Jun 29 '13 at 06:02
Thanks, I like your blog and this article is helpful although it doesn't answer the question why the exception **must not** be thrown. I particularly don't understand the "must" obligation. The common sense suggests there **should be no exceptions** because the hash value is something relatively straightforward to calculate and there simply not that many places where the exception can be thrown (division by 0?). It is a sort of sanity check, if your function is complex to the extent that you cannot go along without throwing an exception the code needs to be reconsidered. But it's not a must. — Trident D'Gao, Jun 29 '13 at 15:15
It should *always* be legal to put an object, in *any state whatsoever* into a hash table, and therefore GetHashCode must not throw. — Eric Lippert, Jun 30 '13 at 00:27

score 2 · Answer 1 · answered Jun 28 '13 at 20:59

2

The question presumes that creating a new object is expensive. It is not. Creating a new object involves moving the heap pointer up by the size of the object, zeroing out those bytes, and then calling the constructor. As long as the constructor doesn't do anything expensive, and there is sufficient memory for the allocation (if there isn't, the GC needs to run and that is not cheap) this will be very fast.

The "cost" associated with creating new objects comes when they need to be collected. Creating more objects means that you need to collect the garbage more often, so you're slowing down your code at some indeterminate point in the future. Having said that, unless you're creating a lot of objects, or very large objects, and those objects are not long lived, this shouldn't be a problem at all.

answered Jun 28 '13 at 20:59

Servy

202,030
26
332
449

2

I thought the guidelines for `GetHashCode` specify that it should never throw. – Dustin Kingen Jun 28 '13 at 21:01
+1. @Romoku how creation of an object is related to "should never throw"? – Alexei Levenkov Jun 28 '13 at 21:07
@AlexeiLevenkov Out of memory exception. – Dustin Kingen Jun 28 '13 at 21:12
2

@Romoku I don't think "should never throw" means "must handle all exceptional cases", but rather "should not throw in normal code flow". Otherwise you also rule out *any* function calls since they can cause SO at very least, maybe even failure to JIT or other insane exceptions. – Alexei Levenkov Jun 28 '13 at 21:20
"Implementations of the GetHashCode method must not throw exceptions." http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx – Jim Mischel Jun 28 '13 at 23:22
@JimMischel First off, that's a convention, not a rule. Technically nothing prevents you from doing that. Next, it's literally impossible for any method to ever *guarantee* that it never throws. No matter what your code contains it can *always* throw a `ThreadAbortException`. The idea is that you should do everything that you can to avoid throwing an excpetion in a `GetHashCode` call, and avoid performing actions with a non-negligible chance of failing, even exceptionally. FYI, don't take MSDN statements as gospel, there's a lot of marginal/bad info there in places. – Servy Jun 28 '13 at 23:27
@Servy: Actually there are ways to make methods that don't throw ThreadAbortException. For example, you can run the entire method inside a finally block. A thread abort exception will be deferred until control leaves the finally. – Eric Lippert Jun 29 '13 at 02:54
I dislike the philosophy that says one should regard new object creation as cheap. It's often very cheap, and usually isn't very expensive, but code which generates a lot of unnecessary garbage will rely upon the surrounding program not to have certain memory usage patterns. I prefer not to impose such requirements on the surrounding code absent a good reason for doing so. – supercat Jun 29 '13 at 23:52

score 1 · Accepted Answer · answered Jun 29 '13 at 23:50

Implementations of GetHashCode() should be written in such a fashion that every call after the first will be fast. If computation of an object's hash value would require the construction of a new object, that's a pretty good sign that the object should probably cache the hash value after it's been computed once, so future requests for the hash code can be answered quickly. Note that it probably isn't worthwhile to use a flag to indicate whether the hashcode has been computed; instead, decide that zero won't be a valid hash code value, and have a HashCode field which is initialized to zero but gets overwritten with the hash code once it has been computed. If the computation method for the hash code would yield zero, substitute some other arbitrary value instead. Both zero and the other value should only occur about 1/4,000,000,000 of the time, so doubling the probability of that other value to 1/2,000,000,000 should be no big deal.

If you are caching hash codes, then it shouldn't matter too much if their computation is somewhat expensive, since it will only be done once. Even if a hash code is cached, one should still try to make it reasonably fast, but not worry too much about it.

Is it a bad idea creating an object in a GetHashCode function?

2 Answers2