13

I have heard that the .NET System.Collections.Immutable collections are implemented as balanced binary trees in order to satisfy their immutability constraints, even collections which traditionally model hash tables like Dictionary, by using the integral value of GetHashCode as a sort key.

If I have a type for which it is cheap to generate a hash code, and for which is cheap to compare (e.g. string or int), and I don't care about the sorted-ness of my collection, would it make sense to prefer ImmutableSortedDictionary because the underlying data structure is sorted anyway?

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • 2
    *If I have a type for which it is cheap to generate a hash code, and for which is cheap to compare (e.g. string or int), and I don't care about the sorted-ness of my collection.* I think you've answered yourself. Keep it simple. From another programmers point of view, i'd be confused to read the code and find out the sortness of a sorted dictionary is useless, leaving me wondering why it was used in the first place. – Yuval Itzchakov Apr 08 '15 at 19:28
  • @Yuval: Given that the data structure is an AVL tree, IMHO mimicking a hash table interface on top is less "simple". – Billy ONeal Apr 08 '15 at 19:58
  • Which data structure are you talking about? – Yuval Itzchakov Apr 08 '15 at 20:17
  • 2
    @YuvalItzchakov: `ImmutableDictionary` and `ImmutableSortedDictionary` (which are both AVL trees) – Billy ONeal Apr 08 '15 at 21:42
  • 1
    Why does that matter then? Why not use the `ImmutableDictionary` if sortness is a non issue – Yuval Itzchakov Apr 08 '15 at 21:45
  • 2
    @Yuval: Because `ImmutableDictionary` *is* sorted. It's just sorted by the numeric value of `GetHashCode` rather than by an intrinsic sort provided by the key type. – Billy ONeal Apr 08 '15 at 21:47
  • 1
    But thats an *implementation detail* which may change at any given time. Do you care about that at all? I'm having a hard time understanding what the benefit of a `ImmutableSortedDictionary` is from your point of view – Yuval Itzchakov Apr 08 '15 at 21:49
  • 2
    @Yuval: It's not an implementation detail any more than Dictionary being a hash table is an implementation detail. The value would be that the sorted-by-T version may be faster in typical use cases than the sorted-by-GetHashCode version in the same way Dictionary is typically faster than SortedDictionary. – Billy ONeal Apr 08 '15 at 21:51
  • 1
    Then why not simply put it to a test? We don't know your data structures and you asked a rather wide question. Wouldn't that be more effective? – Yuval Itzchakov Apr 08 '15 at 21:53
  • 1
    @Yuval: For the same reason I don't build every program ever twice to see if SortedDictionary is faster than Dictionary in the mutable case; I'm looking for the "reasonable default to reach for in most cases". – Billy ONeal Apr 08 '15 at 21:54

3 Answers3

13

The answer is yes, it can make sense to prefer ImmutableSortedDictionary in certain conditions, for instance with Int32 keys.

In my case, with Int32 keys I found out that ImmutableSortedDictionary was a better pick.

I have run a small benchmark using 1 million items:

  • Insert 1,000,000 items in ascending order of key
  • Update 1,000,000 random items
  • Scan 1,000,000 items, i.e. iterate once over each item in the collection
  • Read 1,000,000 random items
  • Delete 1,000,000 random items

ImmutableDictionary<int, object>

Insert: 2499 ms
Update: 7275 ms
Scan:    385 ms
Read:    881 ms
Delete: 5037 ms

ImmutableSortedDictionary<int, object>

Insert: 1808 ms
Update: 4928 ms
Scan:    246 ms
Read:    732 ms
Delete: 3522 ms

ImmutableSortedDictionary is a bit faster than ImmutableDictionary on all operations. Note that insertion was done one item at a time in ascending order of key (because it happens to match my particular use case).

However, you should also consider using a mutable collection with some locking. Writing to a mutable Dictionary<int, object> is one order of magnitude faster.

ZunTzu
  • 7,244
  • 3
  • 31
  • 39
0

A hash-based collection should be significantly faster on .NET because:

  1. It can use a more efficient search tree specialized for int keys such as a hash trie or Patricia tree.

  2. Its inner loop will do almost entirely int comparisons rather than generic comparisons.

However, if you need better performance you will usually be much better off switching to a mutable collection like HashSet.

J D
  • 48,105
  • 13
  • 171
  • 274
-3

It should not really matter. The time complexities from this blog post show that while you do get better performance from Dictionary.Add than SortedDictionary.Add (O(1) vs O(log n)) both ImmutableDictionary and ImmutableSortedDictionary have a time complexity of O(log n)

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • 1
    1. Well, I'm more interested in practical behavior; `ImmutableArray` is much faster to iterate than `ImmutableList`, for example; but has the same complexity. 2. Of course ImmutableDictionary falls back to linear behavior in the event `GetHashCode` values are the same... – Billy ONeal Apr 08 '15 at 20:03
  • @BillyONeal The one thing that the article was not clear about is the worst case of `ImmuteableSortedDictionary` I am not sure if it is `O(n log n)` or `O(n)`. – Scott Chamberlain Apr 08 '15 at 20:41