8

I have a method that uses recursion to traverse a tree and update the items.

Currently the method takes pretty long to process all the items, so i started optimizing things. Among those things is the use of a dictionary instead of executing a DB query for each item.

The dictionary is defined as

System.Collections.Generic.Dictionary<EffectivePermissionKey, MyData>

The key type is defined as

private struct EffectivePermissionKey
{
  // http://blog.martindoms.com/2011/01/03/c-tip-override-equals-on-value-types-for-better-performance/
  public override bool Equals(object aObject)
  {
    if (aObject == null)
      return false;
    else
      return aObject is EffectivePermissionKey && Equals((EffectivePermissionKey)aObject);
  }

  public bool Equals(EffectivePermissionKey aObject)
  {
    return this.ID == aObject.ID && this.OrchardUserID == aObject.OrchardUserID;
  }

  public override int GetHashCode()
  { 
    // http://stackoverflow.com/a/32502294/3936440
    return unchecked(ID.GetHashCode() * 23 * 23 + OrchardUserID.GetHashCode() * 23);
  }

  public int ID;
  public int OrchardUserID;
}

When the method runs it takes around 5000 recursions to update all items.

Initially it took around 100 seconds without the dictionary.

A first approach with DB queries replaced by the use of a dictionary with int keys took 22 seconds.

Now, with DB queries replaced by the use of the dictionary defined above and proper TryGetValue() calls it takes 97 seconds <- WAT.

What is going on here? What could cause this massive performance drop?

Edit

At first, it seemed like a hash collision issue to me, so i added a breakpoint in EffectivePermissionKey.Equals() to verify that this method is called but it's not called, therefore no hash collision i guess.

Edit2

Now i'm confused. I thought Equals() only gets called when the hash code does not match. After printing out the hash codes of my keys and the keys used in TryGetValue() i see that these codes match. Then i looked at the source code of Dictionary<> and there's a line in FindEntry() that looks like this:

if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;

This means that for each item key in the dictionary the GetHashCode() and Equals() gets called because i process all items in the dictionary as the items are the result of the DB query whereas these results where processed before the dictionary approach anyway.

ViRuSTriNiTy
  • 5,017
  • 2
  • 32
  • 58
  • 1
    I don't see any problem with your `Equals` and `GetHashCode` (I do prefer the 17/23 "cumulative" presented here http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode, but still your "non cumulative" version shouldn't cause too many collisions – xanatos Feb 05 '16 at 10:28
  • 5
    Maybe it's because of boxing/unboxing? `EffectivePermissionKey` does not implement `IEquatable` That means the dictionary will use an `ObjectEqualityComparer` – Dennis_E Feb 05 '16 at 10:29
  • 1
    You're not implementing `IEquatable` in your struct, so your struct will be boxed and that hurts performance. – Sriram Sakthivel Feb 05 '16 at 10:30
  • @xanatos Yes, hash collisions came to mind immeditelly, see my edit – ViRuSTriNiTy Feb 05 '16 at 10:31
  • @Dennis_E Nice, i read about this some time ago but couldn't remember, will try it asap – ViRuSTriNiTy Feb 05 '16 at 10:31
  • How many items in the tree / dictionary? – Roland Feb 05 '16 at 10:32
  • Crystal ball says that you ought to look at the Output window and see a very large number of "first chance exception" notifications. – Hans Passant Feb 05 '16 at 10:32
  • @Roland, no, the dictionary has around 30.000 keys. – ViRuSTriNiTy Feb 05 '16 at 10:33
  • @HansPassant Nope, no exceptions. – ViRuSTriNiTy Feb 05 '16 at 10:34
  • So you use a dictionary instead of a database query. Does this introduce a new step to fill the dictionary with one big database query to replace many small database queries? Have you profiled your code to figure how much time is spent with the database versus the dictionary code? – Roland Feb 05 '16 at 10:41
  • @Roland Yeah, the dictionary fill is not the issue, my measurements refer to the traversal with the `TryGetValue()` calls – ViRuSTriNiTy Feb 05 '16 at 10:51
  • @Dennis_E Implementing `IEquatable<>` helps in avoiding a call to `Equals(object)` but it doesnt improve the performance at all, its around 1 second. I think i'm approaching this optimization task the wrong way. See my edit2. – ViRuSTriNiTy Feb 05 '16 at 11:30
  • 1
    Regarding the `GetHashCode` and `Equals`: Two objects with different hash should never be equal, but two objects with same hash might still be different. This is, why an equal hash is followed by an equality check and only then the item is returned. – grek40 Feb 05 '16 at 11:43
  • 1
    Can you post a fuller [mcve] that exhibits the problem? – Lasse V. Karlsen Feb 05 '16 at 12:48
  • @LasseV.Karlsen Nah, no need, see my answer, i got it all wrong. – ViRuSTriNiTy Feb 05 '16 at 17:35

1 Answers1

2

Nevermind folks, sorry for taking your time, my approach was completely wrong. Let me tell why.

The issue broken down for simplicity:

A -> recursion 1, DB query for permission of node A with ID = 1
  B -> recursion 2, DB query for permission of node B with ID = 2
  C -> recursion 3, DB query for permission of node C with ID = 3
    D -> recursion 4, DB query for permission of node D with ID = 4

As you can see, one DB query per tree node.

Now the flawed approach for optimizing this:

Dictionary<int, PermissionData> myMap

...

DB query of all permissions and insert into myMap

...

A -> recursion 1, myMap.TryGetValue(1, out ...)
  B -> recursion 2, myMap.TryGetValue(2, out ...)
  C -> recursion 3, myMap.TryGetValue(3, out ...)
    D -> recursion 4, myMap.TryGetValue(4, out ...)

You see now that the query is done once but on each node aTryGetValue() call is made.

In my specific case this is actually slower as executing single queries because

  • the dictionary contains as much keys as nodes exist because each node has a DB permission entry

and

  • each TryGetValue() requires / results in

    1. creating a key instance (with ID and user ID)
    2. calling TryGetValue()
    3. calculating the hash of the key instance
    4. calling Equals()

These 4 steps are executed around 5000 times versus executing 5000 simple entity framework queries (SELECT * FROM table WHERE ID = ...). I don't know why, but the queries are faster here, perhaps the compiler optimizes something away.

Anyway, i reworked the whole thing and now i have an outer loop over User IDs and an inner recursive traversal witch uses dictionaries with a simple int key (node ID). It gives me lighting fast results. The whole execution now takes around 16 seconds and with a few more tweaks and threading i got it down to under 1 second. Mission accomplished.

Edit

After discussing this issue with a colleague we came to the conclusion that the performance issue is most likely caused by the prime numbers used in the hash code calculation. I used 23 x 23 x 23 but it should be something like 17 x 23 x 23 to avoid collisions but i cannot test this as the concerning code / application is not in my responsibility anymore. Btw a generic solution can be found here: https://stackoverflow.com/a/763966/3936440

Edit 2

As noted by a colleague the following answer suggests to not use 17 and 23, instead use larger prime numbers, see https://stackoverflow.com/a/38281271/3936440

ViRuSTriNiTy
  • 5,017
  • 2
  • 32
  • 58
  • 3
    I seriously doubt a single dictionary lookup is slower than a database query, there is something else at play here so I don't consider this a good answer to your question to be honest. – Lasse V. Karlsen Feb 05 '16 at 19:11
  • @LasseVågsætherKarlsen Answer improved by recent findings – ViRuSTriNiTy Mar 13 '19 at 16:06
  • 1
    Yes, if your hashcode function is defunct, that could explain it as you essentially end up with a poorly performing dictionary which in some cases might end up doing a linear search for your objects. – Lasse V. Karlsen Mar 15 '19 at 11:49