1

I have 2 different IEnumerables:

IEnumerable<TypeA> ListA & IEnumerable<TypeB> ListB

Both types have the property called "PersString".

My goal is to get for each item in ListA the corresponding items of ListB with the same "PersString".

I started with a ForEach-Loop in ListA nesting a ForEach-Loop of ListB and checking if "PersString" of the ListA-item matches the "PersString" of the ListB-item.

Is there a more efficient way of coding using Linq ?

Thank you.

ASh
  • 34,632
  • 9
  • 60
  • 82
CombatKarl
  • 11
  • 2
  • See [Linq Join](https://learn.microsoft.com/en-us/dotnet/csharp/linq/perform-inner-joins) – JonasH Sep 14 '22 at 14:27
  • 1
    Note that linq if for convenience, not performance. Transforming one of the lists to a dictionary, using your property as key, should give similar performance as Linq. At least assuming the property is unique in each list. – JonasH Sep 14 '22 at 14:30
  • @JonasH I guess you haven't seen the new optimizations in .net 7... linq's min, max, average and sum have become insanely fast due to vectorization. Can beat that with a for loop. [Linking nick chapsas](https://youtu.be/zCKwlgtVLnQ). Anyhow, your tip about the dictionary is true, but only matters if you plan to do this matching more often. Building the dictionary isn't free. Everything always depends. – JHBonarius Sep 15 '22 at 10:30
  • @JHBonarius I'm not trying to imply that Linq is *bad*. The performance / readability ratio is fantastic, and that is usually much more important than the absolute performance. But in some cases lower level code is preferable. In this particular case I would expect Join/intersect to use some type of lookup table to get `O(n log n)` runtime, similar to that of using a dictionary, even if the constant factors might be a bit different. Either would be better than `O(n^2)` of the double loop OP describes for any significant n. – JonasH Sep 15 '22 at 11:40

3 Answers3

2

Is there a more efficient way of coding using Linq ?

Yes, you can join them. In Linq-To-Object this is (much) more efficient:

var query = from a in ListA 
            join b in ListB on a.PersString equals b.PersString
            select (A: a, B: b);
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • `Where` is not the best choice to create the intersection of a data set, especially if the definition of equality is complex. For this purpose we should use `Intersect`. – BionicCode Sep 15 '22 at 09:43
  • @BionicCode: `Intersect` is not possible if you have two different types as here. Also, i'm not using `Where` but `Enumerable.Join` which is a set based approach(like `Intersect`). – Tim Schmelter Sep 15 '22 at 09:44
  • The benchmark you are referring to compares Join to Where. That's what I mean. Where is more useful when searching in a single set. It would be more interesting to know Join vs Intersect (or IntersectBy). IntersectBy allows to find the intersection of collections of different type. – BionicCode Sep 15 '22 at 09:49
  • @BionicCode: the `Where` approach which the `Join` is compared with is the same as what OP is using with a nested `foreach` loop. That's why it's relevant to mention that `Join` is more efficient(in that benchmark 225 times faster). – Tim Schmelter Sep 15 '22 at 11:39
  • I never believed in `Where` being a solution. When comparing each element of one set with each element of another set, it should be instantly clear that `Where` is not worth to think about, at least when efficiency matters. `Where` disqualifies itself and it was never meant to be used in such use case. `Where`, like in SQL, is mean to test an element of a set against a condition and not against elements of a different set or to merge sets. To me, in this scenario comparing `Join` to `Where` is like comparing `Join` to `Select`. It's just meaningless to compare both in the current scenario. – BionicCode Sep 15 '22 at 11:57
1

In addition to Enumerable.Join, LINQ offers the Enumerable.Intersect method and since .NET 6 the more convenient and powerful Enumerable.IntersectBy.

In case of Enumerable.Intersect, more complex types require you to provide an IEqualityComparer<T> implementation or let the data type itself implement IEquatable<T> to define equality of this type.

Example Intersect (Prior to .NET 6):

Does not support comparison of two sets of different type.

class Person : IEquatable<Person>
{
  public bool Equals(Person p) => this.PersString == p?.PersString;
  public override int GetHashCode() => HashCode.Combine(PersString);

  public int ID { get; set; }
  public string PersString { get; set; }
}
IEnumerable<Person> collectionA;
IEnumerable<Person> collectionB;

IEnumerable<Person> equalPersonInstances = collectionA.Intersect(collectionB);

   // In case the compared type  does not implement IEquatable, we would have to provide an IEqualityComparer
// IEnumerable<Person> equalMyTypeInstances = collectionA.Intersect(collectionB, new MyComparer());

Example IntersectBy (.NET 6 and later):

Since .NET 6 we can use the ...By methods to pass in a lambda expression or method group as equality comparer. In this case we call Enumeable.IntersectBy, which supports to find the intersection of two sets of different type.

IEnumerable<PersonA> collectionA;
IEnumerable<PersonB> collectionB;

IEnumerable<PersonA> intersection = collectionA.IntersectBy(
  collectionB.Select(personB => personB.PersString), 
  personA => personA.PersString);

Example Join (using LINQ Enumerable extension method)

For those who prefer to use the LINQ extension methods:

IEnumerable<PersonA> collectionA;
IEnumerable<PersonB> collectionB;

// The result is a set of ValueTuple
IEnumerable<(Person, PersonB)> intersection = collectionA.Join(
  collectionB, 
  personA => personA.PersString, 
  personB => personB.PersString, 
  (personA, personB) => (personA, personB));
BionicCode
  • 1
  • 4
  • 28
  • 44
  • Intersect is not a viable option since OP has 2 different types: `TypeA` and `TypeB` – Tim Schmelter Sep 15 '22 at 09:41
  • This is where the .NET 6 `IntersectBy` excels: it allows to create an intersection of different types. – BionicCode Sep 15 '22 at 09:46
  • That's nice(if you can use it). But your examples are still using a single class `Person` which is not the use-case here. It also complicates a custom EqualityComparer if you have multiple types. So the `IEquatable` is not used here, you need to specify a comparer for the string property. – Tim Schmelter Sep 15 '22 at 09:54
  • Thank you. I have highlighted the fact that only `IntersectBy` can be used for the special case where two sets are of different type. – BionicCode Sep 15 '22 at 10:04
  • The intersection will only return the data of the elements of A. The information of the matches in B will not be returned. – JHBonarius Sep 15 '22 at 10:28
  • @JHBonarius That's correct. The return type in the example indicates this. IntersectBy is not useful if you need all intersecting instances. But since you are comparing on a particular value(s)/property(ies), chances are high that you are only interested in those values that occur in both collection. It definitely has its special use case. Same applies to Join. – BionicCode Sep 15 '22 at 10:52
  • @JHBonarius The OP must decide which method fits his exact requirements. Both methods, IntersectBy and Join, satisfy his question as posted. – BionicCode Sep 15 '22 at 10:55
  • 1
    @BionicCode: I think you misunderstood JHBonarius. Unlike with `Join` you don't get both information(TypeA and TypeB as a pair) from `Intersect`/`IntersectBy`. – Tim Schmelter Sep 15 '22 at 11:36
  • @TimSchmelter No I understand. From the return type in my example, which I have explicitly defined, you could tell that I'm fully aware. From the details provided in the question we don't know if all instances are needed. He primarily wants to test which items are present in both sets. For example, if he is only interedsted in the PersString value that exist in both, you don't need all instances. It really depends on the exact use case which method is to prefer. `IntersectBy` and `Join` both satisfy the requirement to find intersections in both sets. – BionicCode Sep 15 '22 at 11:44
0

Join is more efficient way

var result = from x in ListA
                     join y in ListB
                on x.PersString equals y.PersString
                     select new {x,y};
Vikram Bose
  • 3,197
  • 2
  • 16
  • 33