1

Original description:

I am dealing with single threaded synchronous logic and I started observing some differences in output depending on whether I run code in debug or release mode. I started wondering where the lack of determinism can come from. I make a heavy use of LINQ, so I thought maybe collections like HashSet, which don't have inherent ordering, cause the issue?

If they aren't deterministic that would have some interesting consequences, like, for instance, First() would be really a random element with potentially some non-obvious probability distribution...

Further explanations:

I don't care about criterion of ordering. Specifically I don't care if it's insertion order. I only ask if the ordering is set and shared between order-sensitive operations on HashSet without any changes.

Example for illustration purposes:

I have a method, not relying on any state, that creates a HashSet. It puts some elements in and returns First().

  • Do I have guarantee that it will always be same element between method calls?

  • Will it be same element when the whole application is terminated and executed again?

If answer to any of these is no, that would mean to me, that from programmers perspective there is not determinism for LINQ order-sensitive operations on HashSet. Once again, I don't have requirement to know ordering criterion (comparer used internally).

Szymon Brych
  • 111
  • 2
  • 10
  • In short though, unless you use an `OrderBy` then order will be undefined. The existing implementation may order it a particular way - but that is an implementation detail rather than part of the contract. – mjwills Jul 11 '17 at 10:04
  • 2
    You should not rely on the order of a `HashSet<>` but it should not change anyway. Worth reading because related: https://stackoverflow.com/a/657289/284240 – Tim Schmelter Jul 11 '17 at 10:04
  • Possible duplicate of [Does HashSet preserve insertion order?](https://stackoverflow.com/questions/657263/does-hashset-preserve-insertion-order) – mjwills Jul 11 '17 at 10:05
  • Order is not significant to me, I only ask if its consistent. This is what I mean by determinism in this case. – Szymon Brych Jul 11 '17 at 10:09
  • @SzymonBrych: specify consistency. Between each run or method call, between different instances of your class, between frameworks, with multiple threads? After you have added or removed items from the hashset? – Tim Schmelter Jul 11 '17 at 10:11
  • @TimSchmelter Let's say I have a stateless method that creates hashset, puts some elements in, returns First(). Will it always be same element? – Szymon Brych Jul 11 '17 at 10:15
  • 1
    @SzymonBrych With the current implementation, as long as you never remove items, then yes it will always be the same element. But this is an implementation detail - **not part of the contract**. So, for example, when you update your .NET Framework version then it might change. If you want explicit order, you must use `OrderBy`. – mjwills Jul 11 '17 at 10:16
  • 2
    @SzymonBrych: yes, until you don't remove. But that might change with the next framework, you should not rely on it. Have you read J.Skeets answer i've linked above? There are other collections where the order is guaranteed. Of course you can also use `hashSet.OrderBy(x => something).First()`. But if you always do that you use the wrong collection because it's pretty inefficient. – Tim Schmelter Jul 11 '17 at 10:17
  • *Let's say I have a stateless method that creates hashset, puts some elements in, returns First(). Will it always be same element?* - are those reference or value elements? If reference, are they using the default `Object.GetHashCode()`? – dbc Jul 11 '17 at 10:20
  • @dbc Interesting question. I understand you are asking if objects inside collection are modified from outside which could potentially affect their hash code? In my specific case these are references with standard, not overridden GetHashCode – Szymon Brych Jul 11 '17 at 10:37

1 Answers1

2

If they aren't deterministic that would have some interesting consequences, like, for instance, First() would be really a random element with potentially some non-obvious probability distribution...

It is important to note that the two options here are not defined and random. There are in fact - defined and undefined. Undefined doesn't mean random or unpredictable - it must means I promise nothing.

MSDN explicitly states:

The HashSet class provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order. (emphasis mine)

If you want to be sure about order (or you want it to be deterministic), then you need to use OrderBy, or a different type (e.g. SortedSet).

mjwills
  • 23,389
  • 6
  • 40
  • 63
  • Unless some entropy is introduced though, there are some scenarios where the items will always be returned in the same order (on successive runs). If the items have deterministic hashcodes and are added in the same order (always A B C) then they will always be returned in the same order (even if that is A C B) – Matthew Steeples Jul 11 '17 at 10:54
  • 1
    With the current implementation, yes @MatthewSteeples . **But that is not part of the contract.** And that is the key part of **undefined** - by saying the elements are in no particular order then MS are free to alter the implementation details (without breaching the contract - since it **promised nothing**). – mjwills Jul 11 '17 at 10:56
  • @MatthewSteeples I guess it's unpractical to define what that entropy would be? How about not successive runs? – Szymon Brych Jul 11 '17 at 11:23
  • 1
    @mjwills I undertand that it's a bad practice to assume determinism of ordering as it's not guaranteed regardless of current state. – Szymon Brych Jul 11 '17 at 11:23
  • @mjwills agreed, I was just trying to help to clarify your definition of undefined and random – Matthew Steeples Jul 11 '17 at 11:23
  • @SyzmonBrych We could give you an answer based on the current behaviour (which we have in the comments against your question). But ultimately, you are asking the _wrong_ question. **The whole point of Microsoft saying `whose elements are in no particular order` is to avoid this line of questioning.** If they say "we promise nothing" then there is no point clarifying how things will work - they've made it clear that you can't rely on any ordering. – mjwills Jul 11 '17 at 11:27