I encountered some odd behavior using C# HastSet with LINQ's Join method that I don't understand. I've simplified what I am doing to help focus on the behavior I am seeing.
I have the following:
private HashSet<MyClass> _mySet; // module level
IEnumerable<ISearchKey> searchKeys; // parameter.
// Partial key searches are allowed.
private IEqualityComparer<ICoreKey> _coreKeyComparer; // Module level.
// Compares instances of MyClass and ISearchKey to determine
// if they match.
Given that
- There is a 1-to-many relationship between searchKeys and _mySet.
- MyClass implements interface IPartialKey and ICoreKey.
- ISearchKey inherits from IPartialKey and ICoreKey.
- MyClass and ISearchKey instance both override the GetHashCode method.
- MyClass's hash code value is based on its full key value, which includes its ICoreKey and IPartialKey values plus other fields.
- The full keys used by MyClass are not unique. Two different MyClass instances can have the same hash code.
- ISearchKey's hash code value is based only on its ICoreKey and IPartialKey values. i.e. The ISearchKey hash code might differ from the hash code for a matching MyClass instance. (Side note: in the case where I first encountered the problem, the ISearchKey's IPartialKey values match the MyClass full key, so the GetHashCode methods would return the same value for both ISearchKey and MyClass. I included the additional complexity to better illustrate the underlying logic on what I am doing.)
- The _coreKeyComparer.GetHashCode method returns the same value for matching instances of ISearchKey and MyClass using only their ICoreKey values.
- The _coreKeyComparer.Equals method cast the parameters to MyClass and ISearchKey respectively and returns true if their IPartialKey values match. (Side note: _coreKeyComparer has been HEAVILY tested and works correctly.)
I expected a join between the two collections should result in something like:
{searchKey_a, myClass_a1},
{searchKey_a, myClass_a2},
{searchKey_a, myClass_a3},
{searchKey_b, myClass_b1},
{searchKey_b, myClass_b2},
{searchKey_c, myClass_c1},
{searchKey_c, myClass_c2},
{searchKey_c, myClass_c3},
{searchKey_c, myClass_c4},
etc....
i.e The same ISearchKey instance would occur multiple times, once for each matching MyClass instance it is joined to.
But when I do a join from searchKeys to _mySet:
var matchedPairs = searchKeys
.Join(
_mySet,
searchKey => searchKey,
myClass => myClass,
(searchKey, myClass) => new {searchKey, myClass},
_coreKeyComparer)
.ToList();
I only get one MyClass instance per searchKeyClass instance. i.e. The matchedPairs collection looks like:
{searchKey_a, myClass_a1},
{searchKey_b, myClass_b1},
{searchKey_c, myClass_c1},
etc....
However if I reverse the join, go from _mySet to searchKeys:
var matchedPairs = _mySet
.Join(
searchKeys,
myClass => myClass,
searchKey => searchKey,
(myClass, searchKey) => new {searchKey, myClass},
_coreKeyComparer)
.ToList();
I get the correct matchedPairs collection. All the matching records from _mySet are returned along with the searchKey which they matched against.
I checked the documentation and examined multiple examples and don't see any reason why the searchKeys-to-_mySet Join gives the wrong answer, while the _mySet-to-searchKeys gives the correct/different answer.
(Side note: I also tried GroupJoin from searchKeys to _myset and go similiar results. i.e. Each searchKeyClass instance found at most one result from _mySet.)
Either I don't understand how the Join method is supposed to work, or Join works differently with HashSet than it does with List or other type of collection.
If the former, I need clarification so I don't make mistakes using Join in the future.
If the latter, then is this differing behavior a .Net bug, or is this the correct behavior with HashSet?
Assuming the behavior is correct, I would greatly appreciate someone explaining the underlying logic behind this (unexpected) Join/HashSet behavior.
Just to be clear, I've already fixed my code so it return the correct results, I just want to understand why I got incorrect results initially.