Left Outer Join, whats the difference between these two approaches?

Question

Whats the difference between these two approaches for performing Left Outer Join with LINQ, for both I am using two lists of Buyers and Suppliers, and joining them by a common district to find the suppliers and buyers that are in the same district.

class Supplier
{
    public string Name { get; set; }
    public string District { get; set; }
}

class Buyer
{
    public string Name { get; set; }
    public string District { get; set; }
}
List<Buyer> buyers = new List<Buyer>()
{
    new Buyer() { Name = "Johny", District = "Fantasy District" },
    new Buyer() { Name = "Peter", District = "Scientists District" },
    new Buyer() { Name = "Paul", District = "Fantasy District" },
    new Buyer() { Name = "Maria", District = "Scientists District" },
    new Buyer() { Name = "Joshua", District = "EarthIsFlat District" },
    new Buyer() { Name = "Sylvia", District = "Developers District" },
    new Buyer() { Name = "Rebecca", District = "Scientists District" },
    new Buyer() { Name = "Jaime", District = "Developers District" },
    new Buyer() { Name = "Pierce", District = "Fantasy District" }
};
List<Supplier> suppliers = new List<Supplier>()
{
    new Supplier() { Name = "Harrison", District = "Fantasy District" },
    new Supplier() { Name = "Charles", District = "Developers District" },
    new Supplier() { Name = "Hailee", District = "Scientists District" },
    new Supplier() { Name = "Taylor", District = "EarthIsFlat District" }
};

First:

var suppliersAndBuyers = from s in suppliers
                         orderby s.District
                         join b in buyers on s.District equals b.District into buyersGroup
                         select buyersGroup.DefaultIfEmpty(
                             new Buyer()
                             {
                                 Name = string.Empty,
                                 District = s.District
                             });

foreach (var item in suppliersAndBuyers)
{
    foreach (var buyer in item)
    {
        Console.WriteLine($"{buyer.District} {buyer.Name}");
    }
}

And Second approach:

var suppliersAndBuyers = from s in suppliers
                                 orderby s.District
                                 join b in buyers on s.District equals b.District into buyersGroup
                                 from bG in buyersGroup.DefaultIfEmpty()
                                 select new
                                 {
                                     Name = bG.Name == null ? string.Empty : bG.Name,
                                     s.District,
                                 };

foreach (var item in suppliersAndBuyers)
{
    Console.WriteLine($"{item.District} {item.Name}");
}

Both produce the exact same output, is the only difference in the way that we output the results? Which one should I use?

Edit: The first approach returns IEnumerable<IEnumerable<Buyer>> the second one returns IEnumerable<AnonymousType>, is the only meaningful difference between the two in what type they are returning and is this the only deciding factor between the two approaches, wether I want a type or anonymous type?

Tip: `Name = bG.Name == null ? string.Empty : bG.Name` -> `Name = bG.Name ?? ""`. — ErikE, Feb 27 '18 at 20:16
Is this a hypothetical for how does this affect a database call or is this in-memory only. — Igor, Feb 27 '18 at 20:16
@Igor I am studying LINQ atm, so I have no real scenario, I was following the Join examples on the MSDN site and saw both of these approaches, applied them, but cant figure out the difference. — Darkbound, Feb 27 '18 at 20:17
On the second, you iterate through your enumerable one more time, constructing all the objects again. I would always prefer the first, personally. However, I believe it, too, does use unrequired additional memory while creating many empty objects and just not using them... — Yotam Salmon, Feb 27 '18 at 20:19
@YotamSalmon that makes sense, why does the first one return IEnumerable> and not just IEnumerable? — Darkbound, Feb 27 '18 at 20:20
The first one returns `IEnumerable>`??? That's a bit weird to me... Are you sure? — Yotam Salmon, Feb 27 '18 at 20:21
@YotamSalmon http://prntscr.com/iklg6r and thats why I have to use two nested loops..... it does... maybe there is something wrong with my query? — Darkbound, Feb 27 '18 at 20:22
Give me a couple of minutes. I must have gotten confused of C#'s `join`. :-P — Yotam Salmon, Feb 27 '18 at 20:24
@mjwills my question is what is the difference between the two approaches, and when to use which. — Darkbound, Feb 27 '18 at 20:28
@mjwills we keep going in a circle.... why would I ever want to use IEnumerable> ? Is the only meaningful difference between the two approaches that one returns anonymous type and the other returns a type (Buyer in this example)? — Darkbound, Feb 27 '18 at 20:30
@Darkbound I answered your second question in my answer. Adding the answer to the original question now. — Yotam Salmon, Feb 27 '18 at 20:39

Yotam Salmon · Accepted Answer · 2018-02-27T20:41:38.783

Alright. From what I see: (A)

var suppliersAndBuyers = from s in suppliers
                         orderby s.District

Enumerates the suppliers list. That's obvious. Now, joining it into the buyers list:

var suppliersAndBuyers = from s in suppliers
                         orderby s.District
                         join b in buyers on s.District equals b.District

This creates the matches (some objects I don't know the type of since I don't have a normal Visual Studio instance in front of me). But for example, it's like Harrison:Jonnie, Hailee:Peter, .... Now we can create an IEnumerable of objects based on those matches (represented by the variables b and s) like that:

var suppliersAndBuyers = from s in suppliers
                         orderby s.District
                         join b in buyers on s.District equals b.District
                         select new {
                             Supplier = s, Buyer = b
                         }

This will create an IEnumerable of anonymous types, with each of the objects representing a pair of a supplier and buyer.

var suppliersAndBuyers = from s in suppliers
                         orderby s.District
                         join b in buyers on s.District equals b.District into buyersGroup

But what you decided to do is a left join, as written in your title. What it does, it takes every element in the list created in snippet (A) and matching to it an IEnumerable of all the matching objects from the buyers list. That produces an enumerable of the matches. For example, for Harrison, the first entry in the suppliers list, you'll get an IEnumerable containing Johnny, Paul and Pierce. Same for the other elements in the suppliers list, ordered by their District.

And this is the reason why you end with an IEnumerable<IEnumerable<Buyer>>. Because for each of the supplier (First IEnumerable dimension) you have a "list" of Buyer (Second dimension + Type explained)

Then coalescing the empty entries is obsolete in my opinion, because you should not have nulls, but just empty IEnumerables, then as iterating through them, you will just not hit any element. (Although I'm not sure of the last paragraph as I never got to compile the code so I don't know)

Now as for the coalescing part, the first example creates a new Buyer object for each of the entries, then takes DefaultIsEmpty. For the second example, it first creates the first dimension and second dimension FULL IEnumerables, then when iterating again, it coalesces the empty values. Which, as I have mentioned in the comments, one unrequired loop.

Thanks Yotam, that clears the confusion about IEnumerable>, the question still remains, when to use either of these approaches, you mentioned in your initial reply that the difference is that the second one will iterate over the group one more time, but when I print them with the first approach I still iterate one additional time. Is the only "meaningful difference" in the fact that one returns and the other returns anonymous types? — Darkbound, Feb 27 '18 at 20:41
Yes, in fact the practical difference for you is the **type** of the expression. But just a clarification, I told you that the second example does one more **iteration**, which doesn't really affect the following `for` loops you wrote. But both have 2 **dimensions** of IEnumerable. This means you will have 2 levels of nested `for`. If you want to save one level and only use one `for`, you can stop at the second code snippets, and instead of a left join, make an inner join. Then you'll end up with a 1 dimensional IEnumerable of matches. — Yotam Salmon, Feb 27 '18 at 20:44

Salah Akbari · Answer 2 · 2018-02-27T20:29:04.193

2

In the first one you are returning an IEnumerable of Buyer because of new Buyer(), while in the second one you've returned an IEnumerable of anonymous types because of select new. They have their pros and cons, advantages and disadvantages.

You can check the following answers to decide:

https://stackoverflow.com/a/21443164/2946329

https://stackoverflow.com/a/48677/2946329

edited Feb 27 '18 at 20:29

answered Feb 27 '18 at 20:14

Salah Akbari

39,330
10
79
109

1

Actually I am returning IEnumerable>, that still doesnt answer whats the difference and when to apply either of these approaches – Darkbound Feb 27 '18 at 20:15
Or at least I can't deduce the answer :) – Darkbound Feb 27 '18 at 20:16
@mjwills I understand why one returns buyer and the other returns anonymous type, I just cant understand if this is the only practical difference between the two approaches? – Darkbound Feb 27 '18 at 20:28
1

@Darkbound I can't see any major difference except this. Please have a look at the linked references to be able to decide when to use which of them. – Salah Akbari Feb 27 '18 at 20:42

Igor · Answer 3 · 2018-02-27T20:43:25.977

Ok so it took me a while to decipher this one :)

So the thing is in first example you are doing

select buyersGroup.DefaultIfEmpty( new Buyer() { Name = string.Empty, District = s.District });

The meaning of that line is... select all buyersGroups and if it's empty return new Buyer() { Name = string.Empty, District = s.District } (as you defined it as default)

In second example

from bG in buyersGroup.DefaultIfEmpty() select new { Name = bG.Name == null ? string.Empty : bG.Name, s.District, };

You are first defining default if group is empty buyersGroup.DefaultIfEmpty() and only after that doing select. Pay close attention to what DefaultIfEmpty parentheses enclose.

Edit

I can't seem to find a reason to have DefaultIfEmpty at all... Don't think you can get null on the left side to need it... And it might make the code slightly easier to understand.

Igor · Answer 4 · 2018-02-27T20:43:59.033

See Enumerable.DefaultIfEmpty. In your first example you are passing new Buyer to that extension method as the default value to return in the join if the Buyer instance can't be found in the join (ie your outer join). There is no select statement afterwards so the result of the join is modeled in your result as IEnumerable<IEnumerable<Buyer>>.

Your second query does use a select with an anonymous projection which is why it results in a flattened collection.

Purely looking at your resulting type

you could mimic the results of the first query by removing the select in the second query

from s in suppliers
orderby s.District
join b in buyers on s.District equals b.District into buyersGroup
from bG in buyersGroup.DefaultIfEmpty();

To go the other direction, you could add the same select statement from the 2nd query at the end of the first.

from s in suppliers
orderby s.District
join b in buyers on s.District equals b.District into buyersGroup
select buyersGroup.DefaultIfEmpty(
    new Buyer()
    {
        Name = string.Empty,
        District = s.District
    ))
select new
{
    Name = bG.Name == null ? string.Empty : bG.Name,
    s.District,
};

As to why use one over the other, that is a matter of opinion and also depends on the situation. In your very simple example without any other context using a projection would be the more readable answer. If you wanted to pass the result to another method or back from the method it is executing in then the IEnumerable<IEnumerable<Buyer>> would be the only way to do it (if constrained to these 2 samples). If you had to do this with a data store then I would recommend you profile the queries to see what the query was that was being executed and profile that.

In short there is no right/wrong answer until you have a specific real world situation where there is a measurable difference between these 2 and what that measure is and the weight applied to it depends on that situation.

Thanks, I understood that, the only question that remains at the moment (which was my main question) is when to use either of these approaches. Is the only "meaningful difference" between the two approaches in that one of them returns a some type and the other one returns anonymous type? — Darkbound, Feb 27 '18 at 20:38

Left Outer Join, whats the difference between these two approaches?

4 Answers4