0

I have two IEnumerable, one with a "Must-have" base of keys, the other with a lot of keys with all kinds of spellings (the same values but containing escape-sequences, different capitalization, etc). I want to create a mapping between every key of the first List to all the ones that correspond with the strings in the second List (Predicate is bool IsPossibleMatch(string a, string b))

So basically I want an IGrouping<string, IEnumerable<string>> because that's exactly the structure of thing I'm looking for.

I tried various versions of keys.GroupBy((x,y) => IsPossibleMatch(x, y)) but couldn't get any of them to work.

Now I'm wondering : Is this even possible with group by? Or do I need to manually do this in some other kind of way?

Vaethin
  • 316
  • 4
  • 18
  • 2
    Please show some example data and also what have you tried? What does it mean "all kinds of spellings"? It is only upper/lower cases differences? – Gilad Green Apr 08 '19 at 06:02
  • Upper/Lower, yes, but there are a few other things as well, for example I also have to look out for certain escaped sequences . First List could for example contain "AlphaBeta" and the Grouping would then contain values such as {"AlphaBeta", "alphaBeta", alpha<beta"}, etc. I have already written the predicate for that though: ```bool IsPossibleMatch(string a, string b)``` – Vaethin Apr 08 '19 at 06:18
  • Please edit question to include this information in the question – Gilad Green Apr 08 '19 at 06:18

1 Answers1

4

Here is a sample code:

var left = new string[] {
    "key1",
    "key2",
    "key3",
    "key4",
    "key5"
};
var right = new string[]
{
    "key1",
    "Key1",
    "KEy1",
    "KEY1",
    "KeY1",
    "kEY1",
    "kEy1",
    "key2",
    "Key2",
    "KEy2",
    "KEY2",
    "KeY2",
    "kEY2",
    "kEy2"
};
var output = left.GroupJoin(
    right,
    leftStr => leftStr.ToLower(),
    rightStr => rightStr.ToLower(),
    (x,y)=>(x,y)
);
foreach (var leftKey in output)
    Console.WriteLine(leftKey.x + "->" + string.Join(",", leftKey.y));

Produces the following output:

key1->key1,Key1,KEy1,KEY1,KeY1,kEY1,kEy1
key2->key2,Key2,KEy2,KEY2,KeY2,kEY2,kEy2
key3->
key4->
key5->
Leighton Ritchie
  • 501
  • 4
  • 15
  • Group Join! Didn't even know that existed! I figured I had to join but I didn't want to mix up the lists by first joining then Grouping. Thank you! – Vaethin Apr 08 '19 at 06:51
  • Follow Up Question though: I am trying to modify my existing IsPossibleMatch to implement IEqualityComparer so I can get the right result. It's easy for the Equals(a , b) method but I wonder if I require it at all. Am I guessing correctly that GroupJoin FIRST Calls GetHashCode() to check whether Equals is possible at all and THEN eliminates false-positives (collisions) by calling Equals()? Because for my use I could just make GetHashCode() return the same int every time so I don't have to do the same compute-intensive calculations twice, both in Equals and GetHashCode.@Leighton Richtie – Vaethin Apr 08 '19 at 10:12
  • Looking at the [reference source](https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,c5fc2efafc275a76) shows that it uses the default equality comparer `EqualityComparer.Default`. Perhaps you can try creating your own class with its `Equals` method overriden, then you can make it do whatever inside. However, this is quite micro-optimising and I don't exepct you to gain significant performance improvement. It's more likely that the bottleneck is elsewhere than here. – Leighton Ritchie Apr 09 '19 at 03:19