355

Assuming I have a left outer join as such:

from f in Foo
join b in Bar on f.Foo_Id equals b.Foo_Id into g
from result in g.DefaultIfEmpty()
select new { Foo = f, Bar = result }

How would I express the same task using extension methods? E.g.

Foo.GroupJoin(Bar, f => f.Foo_Id, b => b.Foo_Id, (f,b) => ???)
    .Select(???)
Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
LaserJesus
  • 8,230
  • 7
  • 47
  • 65

10 Answers10

549

For a (left outer) join of a table Bar with a table Foo on Foo.Foo_Id = Bar.Foo_Id in lambda notation:

var qry = Foo.GroupJoin(
          Bar, 
          foo => foo.Foo_Id,
          bar => bar.Foo_Id,
          (x,y) => new { Foo = x, Bars = y })
       .SelectMany(
           x => x.Bars.DefaultIfEmpty(),
           (x,y) => new { Foo=x.Foo, Bar=y});
B--rian
  • 5,578
  • 10
  • 38
  • 89
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • 43
    This is actually not nearly as crazy as it seems. Basically `GroupJoin` does the left outer join, the `SelectMany` part is only needed depending on what you want to select. – George Mauer Nov 16 '14 at 16:36
  • 11
    This pattern is great because Entity Framework recognizes it as a Left Join, which I used to believe was an impossibility – Jesan Fafon Jun 14 '16 at 21:47
  • @MarcGravell How would you achieve the same to select `only` the rows where right side columns are all null (that is the case in SQL Server Outer Join when match does not meet)? – nam Sep 10 '16 at 04:13
  • 3
    @nam Well you'd need a where statement, x.Bar == null – Tod Oct 27 '16 at 11:46
  • Could I know why you needed SelectMany? does it depend on the type of relationship whether it is one to many? – Abdulkarim Kanaan Jan 05 '18 at 07:31
  • 4
    @AbdulkarimKanaan yes - SelectMany flattens two layers of 1-many into 1 layer with an entry per pair – Marc Gravell Jan 05 '18 at 14:29
  • Why are you using `DefaultIfEmpty()` if `GroupJoin()` does the left join? – Junior Jan 20 '18 at 00:10
  • The `DefaultIfEmpty` is needed to transform the empty `Bars` sequence created by `GroupJoin` when they are no matching `Bar` elements into a one (null) element sequence for the `SelectMany` to then match with the `Foo` element. – NetMage Mar 13 '18 at 18:23
  • Is there a reason to prefer your `SelectMany` over the one argument `SelectMany(xyj => xyj.y.DefaultIfEmpty.Select(y => new { Foo=xyj.x, Bar = y }))` ? – NetMage Mar 13 '18 at 18:25
  • A quick test seems to indicate LINQ generates a 2nd unnecessary `SELECT` ... `FROM` using the one argument version, but either way I would hope it would be optimized away. – NetMage Mar 13 '18 at 18:43
  • @GeorgeMauer: "*Basically GroupJoin does the left outer join, the SelectMany part is only needed depending on what you want to select*". Correct me if I'm wrong: `GroupJoin` doesn't flatten the `IGrouping` (which contains 0 or 1 items). Therefore the left outer join is incomplete and `SelectMany` is needed to complete it anyway – mins May 19 '20 at 12:09
158

Since this seems to be the de facto SO question for left outer joins using the method (extension) syntax, I thought I would add an alternative to the currently selected answer that (in my experience at least) has been more commonly what I'm after

// Option 1: Expecting either 0 or 1 matches from the "Right"
// table (Bars in this case):
var qry = Foos.GroupJoin(
          Bars,
          foo => foo.Foo_Id,
          bar => bar.Foo_Id,
          (f,bs) => new { Foo = f, Bar = bs.SingleOrDefault() });

// Option 2: Expecting either 0 or more matches from the "Right" table
// (courtesy of currently selected answer):
var qry = Foos.GroupJoin(
                  Bars, 
                  foo => foo.Foo_Id,
                  bar => bar.Foo_Id,
                  (f,bs) => new { Foo = f, Bars = bs })
              .SelectMany(
                  fooBars => fooBars.Bars.DefaultIfEmpty(),
                  (x,y) => new { Foo = x.Foo, Bar = y });

To display the difference using a simple data set (assuming we're joining on the values themselves):

List<int> tableA = new List<int> { 1, 2, 3 };
List<int?> tableB = new List<int?> { 3, 4, 5 };

// Result using both Option 1 and 2. Option 1 would be a better choice
// if we didn't expect multiple matches in tableB.
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3    }

List<int> tableA = new List<int> { 1, 2, 3 };
List<int?> tableB = new List<int?> { 3, 3, 4 };

// Result using Option 1 would be that an exception gets thrown on
// SingleOrDefault(), but if we use FirstOrDefault() instead to illustrate:
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3    } // Misleading, we had multiple matches.
                    // Which 3 should get selected (not arbitrarily the first)?.

// Result using Option 2:
{ A = 1, B = null }
{ A = 2, B = null }
{ A = 3, B = 3    }
{ A = 3, B = 3    }    

Option 2 is true to the typical left outer join definition, but as I mentioned earlier is often unnecessarily complex depending on the data set.

Ocelot20
  • 10,510
  • 11
  • 55
  • 96
  • 9
    I think "bs.SingleOrDefault()" will not work if you have another following Join or Include. We need the "bs.FirstOrDefault()" in this cases. – Dherik Feb 03 '15 at 12:35
  • 3
    True, Entity Framework and Linq to SQL both require that since they can't easily do the `Single` check amidst a join. `SingleOrDefault` however is a more "correct" way to demonstrate this IMO. – Ocelot20 Feb 03 '15 at 16:19
  • 1
    You need to remember to Order your joined table or the .FirstOrDefault() is going to get a random row from the multiple rows that might match the join criteria, whatever the database happens to find first. – Chris Moschini Oct 01 '15 at 14:22
  • 2
    @ChrisMoschini: Order and FirstOrDefault are unnecessary since the example is for a 0 or 1 match where you would want to fail on multiple records (see comment above code). – Ocelot20 Oct 01 '15 at 17:22
  • That's applying a requirement not specified in this Question. I never want my queries to throw; if what I get back is more complex than I expected I want to inspect it and make decisions rather than blow up the code. EF exceptions are hard to read and should be truly exceptional and unexpected, rather than just unusual data. See also @Dherik's comment noting the problem with using SingleOrDefault in the first place. – Chris Moschini Oct 01 '15 at 18:40
  • 5
    This isn't an "extra requirement" unspecified in the question, it's what a lot of people think of when they say "Left Outer Join". Also, the FirstOrDefault requirement referred to by Dherik is EF/L2SQL behavior and not L2Objects (neither of these are in the tags). SingleOrDefault is absolutely the correct method to call in this case. Of course you want to throw an exception if you encounter more records than possible for your data set instead of picking an arbitrary one and leading to a confusing undefined result. – Ocelot20 Oct 01 '15 at 19:45
  • I believe your Option 2 `qry` should end with `.GroupBy(x => x.Foo)`. Then a resolving assignment should be like `var result = qry.ToList().Select(x => x.First().Foo).ToList();` When you look at your *Result using Option 2* you can see the duplicates. That is from a Cartesian Result from the join. It's perfectly natural, but in code, we want the relationships resolved and grouped. – Suamere Sep 01 '18 at 15:44
  • 1
    +1 one for going a little deeper and providing an additional option for joins not having multiple matches, which was my case. – Kevbo Nov 15 '18 at 19:11
  • be careful with the first one: **IF using LINQ2SQL** (*do you still use that?*) the `SingleOrDefault()` variant gets translated into a N+1 queries instead of a single SQL query. – mbx Apr 21 '22 at 13:48
70

Group Join method is unnecessary to achieve joining of two data sets.

Inner Join:

var qry = Foos.SelectMany
            (
                foo => Bars.Where (bar => foo.Foo_id == bar.Foo_id),
                (foo, bar) => new
                    {
                    Foo = foo,
                    Bar = bar
                    }
            );

For Left Join just add DefaultIfEmpty()

var qry = Foos.SelectMany
            (
                foo => Bars.Where (bar => foo.Foo_id == bar.Foo_id).DefaultIfEmpty(),
                (foo, bar) => new
                    {
                    Foo = foo,
                    Bar = bar
                    }
            );

EF and LINQ to SQL correctly transform to SQL. For LINQ to Objects it is beter to join using GroupJoin as it internally uses Lookup. But if you are querying DB then skipping of GroupJoin is AFAIK as performant.

Personlay for me this way is more readable compared to GroupJoin().SelectMany()

afruzan
  • 1,454
  • 19
  • 20
  • This perfomed better than a .Join for me, plus I could do my conditonal joint that I wanted (right.FooId == left.FooId || right.FooId == 0) – Anders Mar 12 '18 at 11:38
  • linq2sql translates this approach as left join. this answer is better and simpler. +1 – afruzan Dec 10 '19 at 11:00
  • 4
    Warning! Changing my query from GroupJoin to this approach resulted in a CROSS OUTER APPLY instead of a LEFT OUTER JOIN. That can result in very different performance based on your query. (Using EF Core 5) – Vyrotek Jan 28 '21 at 18:27
21

You can create extension method like:

public static IEnumerable<TResult> LeftOuterJoin<TSource, TInner, TKey, TResult>(this IEnumerable<TSource> source, IEnumerable<TInner> other, Func<TSource, TKey> func, Func<TInner, TKey> innerkey, Func<TSource, TInner, TResult> res)
    {
        return from f in source
               join b in other on func.Invoke(f) equals innerkey.Invoke(b) into g
               from result in g.DefaultIfEmpty()
               select res.Invoke(f, result);
    }
hajirazin
  • 817
  • 8
  • 19
  • This looks like it would work (for my requirement). Can you provide an example? I am new to LINQ Extensions and am having a hard time wrapping my head around this Left Join situation I am in... – Shiva Oct 05 '14 at 03:26
  • @Skychan May be I need to look at it, it's old answer and was working at that time. Which Framework are you using? I mean .NET version? – hajirazin Oct 12 '16 at 05:25
  • 2
    This works for Linq to Objects but not when querying a database as you need to operate on an IQuerable and use Expressions of Funcs instead – Bob Vale May 02 '18 at 09:17
7

Improving on Ocelot20's answer, if you have a table you're left outer joining with where you just want 0 or 1 rows out of it, but it could have multiple, you need to Order your joined table:

var qry = Foos.GroupJoin(
      Bars.OrderByDescending(b => b.Id),
      foo => foo.Foo_Id,
      bar => bar.Foo_Id,
      (f, bs) => new { Foo = f, Bar = bs.FirstOrDefault() });

Otherwise which row you get in the join is going to be random (or more specifically, whichever the db happens to find first).

Chris Moschini
  • 36,764
  • 19
  • 160
  • 190
3

Whilst the accepted answer works and is good for Linq to Objects it bugged me that the SQL query isn't just a straight Left Outer Join.

The following code relies on the LinqKit Project that allows you to pass expressions and invoke them to your query.

static IQueryable<TResult> LeftOuterJoin<TSource,TInner, TKey, TResult>(
     this IQueryable<TSource> source, 
     IQueryable<TInner> inner, 
     Expression<Func<TSource,TKey>> sourceKey, 
     Expression<Func<TInner,TKey>> innerKey, 
     Expression<Func<TSource, TInner, TResult>> result
    ) {
    return from a in source.AsExpandable()
            join b in inner on sourceKey.Invoke(a) equals innerKey.Invoke(b) into c
            from d in c.DefaultIfEmpty()
            select result.Invoke(a,d);
}

It can be used as follows

Table1.LeftOuterJoin(Table2, x => x.Key1, x => x.Key2, (x,y) => new { x,y});
Bob Vale
  • 18,094
  • 1
  • 42
  • 49
2

Turning Marc Gravell's answer into an extension method, I made the following.

internal static IEnumerable<Tuple<TLeft, TRight>> LeftJoin<TLeft, TRight, TKey>(
    this IEnumerable<TLeft> left,
    IEnumerable<TRight> right,
    Func<TLeft, TKey> selectKeyLeft,
    Func<TRight, TKey> selectKeyRight,
    TRight defaultRight = default(TRight),
    IEqualityComparer<TKey> cmp = null)
{
    return left.GroupJoin(
            right,
            selectKeyLeft,
            selectKeyRight,
            (x, y) => new Tuple<TLeft, IEnumerable<TRight>>(x, y),
            cmp ?? EqualityComparer<TKey>.Default)
        .SelectMany(
            x => x.Item2.DefaultIfEmpty(defaultRight),
            (x, y) => new Tuple<TLeft, TRight>(x.Item1, y));
}
2

I have this question bookmarked and need to reference it every year or so. Each time I revisit this, I find I have forgotten how it works. Here's a more detailed explanation of what's happening.

GroupJoin is like a mix of GroupBy and Join. GroupJoin basically groups the outer collection by the join key, then joins the groupings to the inner collection on the join key. Suppose we have customers and orders. If you GroupJoin on the respective IDs, the result is an enumerable of {Customer, IGrouping<int, Order>}. The reason GroupJoin is useful is because all inner objects are represented even if the outer collection contains no matching objects. For customers with no orders, the IGrouping<int, Order> is simply empty. Once we have { Customer, IGrouping<int, Order> }, we can use as-is, filter out results that have no orders, or flatten with SelectMany to get results like a traditional LINQ Join.

Here's a full example if anyone wants to step through with the debugger and see how this works:

using System;
using System.Linq;
                    
public class Program
{
    public static void Main()
    {
        //Create some customers
        var customers = new Customer[]
        {
            new Customer(1, "Alice"),
            new Customer(2, "Bob"),
            new Customer(3, "Carol")
        };
        
        //Create some orders for Alice and Bob, but none for Carol
        var orders = new Order[]
        {
            new Order(1, 1),
            new Order(2, 1),
            new Order(3, 1),
            new Order(4, 2),
            new Order(5, 2)
        };

        //Group join customers to orders.
        //Result is IEnumerable<Customer, IGrouping<int, Order>>. 
        //Every customer will be present. 
        //If a customer has no orders, the IGrouping<> will be empty.
        var groupJoined = customers.GroupJoin(orders,
                              c => c.ID,
                              o => o.CustomerID,
                              (customer, order) => (customer, order));

        //Display results. Prints:
        //    Customer: Alice (CustomerID=1), Orders: 3
        //    Customer: Bob (CustomerID=2), Orders: 2
        //    Customer: Carol (CustomerID=3), Orders: 0
        foreach(var result in groupJoined)
        {
            Console.WriteLine($"Customer: {result.customer.Name} (CustomerID={result.customer.ID}), Orders: {result.order.Count()}");
        }
        
        //Flatten the results to look more like a LINQ join
        //Produces an enumerable of { Customer, Order }
        //All customers represented, order is null if customer has no orders
        var flattened = groupJoined.SelectMany(z => z.order.DefaultIfEmpty().Select(y => new { z.customer, y }));

        //Get only results where the outer table is null.
        //roughly equivalent to: 
        //SELECT * 
        //FROM A 
        //LEFT JOIN B 
        //ON A.ID = B.ID 
        //WHERE B.ID IS NULL;
        var noMatch = groupJoined.Where(z => z.order.DefaultIfEmpty().Count() == 0);
    }
}

class Customer
{
    public int ID { get; set; }
    public string Name { get; set; }

    public Customer(int iD, string name)
    {
        ID = iD;
        Name = name;
    }
}

class Order
{
    static Random Random { get; set; } = new Random();

    public int ID { get; set; }
    public int CustomerID { get; set; }
    public decimal Amount { get; set; }

    public Order(int iD, int customerID)
    {
        ID = iD;
        CustomerID = customerID;
        Amount = (decimal)Random.Next(1000, 10000) / 100;
    }
}
Patrick Tucci
  • 1,824
  • 1
  • 16
  • 22
0

Marc Gravell's answer turn into an extension method that support the IQueryable<T> interface is given in this answer and with added support for C# 8.0 NRT reads as follows:

#nullable enable
using LinqKit;
using LinqKit.Core;
using System.Linq.Expressions;

...

/// <summary>
/// Left join queryable. Linq to SQL compatible. IMPORTANT: any Includes must be put on the source collections before calling this method.
/// </summary>
public static IQueryable<TResult> LeftJoin<TOuter, TInner, TKey, TResult>(
    this IQueryable<TOuter> outer,
    IQueryable<TInner> inner,
    Expression<Func<TOuter, TKey>> outerKeySelector,
    Expression<Func<TInner, TKey>> innerKeySelector,
    Expression<Func<TOuter, TInner?, TResult>> resultSelector)
{
    return outer
        .AsExpandable()
        .GroupJoin(
            inner,
            outerKeySelector,
            innerKeySelector,
            (outerItem, innerItems) => new { outerItem, innerItems })
        .SelectMany(
            joinResult => joinResult.innerItems.DefaultIfEmpty(),
            (joinResult, innerItem) =>
                resultSelector.Invoke(joinResult.outerItem, innerItem));
}
Dejan
  • 9,150
  • 8
  • 69
  • 117
-1

It's more simplified for me.

var appuser = appUsers.GroupJoin(trackLogin, u => u.Id, ur => ur.UserId, (u, ur) => new { u = u, ur = ur })
                    .Select( m => new { m.u.Id, m.u.Email, m.u.IsSuperUser, m.u.RoleId, 
                        LastLogin = m.ur.Select(t => t.LastLogin).FirstOrDefault()}).ToList();
Majedur
  • 3,074
  • 1
  • 30
  • 43