I have this question bookmarked and need to reference it every year or so. Each time I revisit this, I find I have forgotten how it works. Here's a more detailed explanation of what's happening.
GroupJoin
is like a mix of GroupBy
and Join
. GroupJoin
basically groups the outer collection by the join key, then joins the groupings to the inner collection on the join key. Suppose we have customers and orders. If you GroupJoin
on the respective IDs, the result is an enumerable of {Customer, IGrouping<int, Order>}
. The reason GroupJoin
is useful is because all inner objects are represented even if the outer collection contains no matching objects. For customers with no orders, the IGrouping<int, Order>
is simply empty. Once we have { Customer, IGrouping<int, Order> }
, we can use as-is, filter out results that have no orders, or flatten with SelectMany
to get results like a traditional LINQ Join
.
Here's a full example if anyone wants to step through with the debugger and see how this works:
using System;
using System.Linq;
public class Program
{
public static void Main()
{
//Create some customers
var customers = new Customer[]
{
new Customer(1, "Alice"),
new Customer(2, "Bob"),
new Customer(3, "Carol")
};
//Create some orders for Alice and Bob, but none for Carol
var orders = new Order[]
{
new Order(1, 1),
new Order(2, 1),
new Order(3, 1),
new Order(4, 2),
new Order(5, 2)
};
//Group join customers to orders.
//Result is IEnumerable<Customer, IGrouping<int, Order>>.
//Every customer will be present.
//If a customer has no orders, the IGrouping<> will be empty.
var groupJoined = customers.GroupJoin(orders,
c => c.ID,
o => o.CustomerID,
(customer, order) => (customer, order));
//Display results. Prints:
// Customer: Alice (CustomerID=1), Orders: 3
// Customer: Bob (CustomerID=2), Orders: 2
// Customer: Carol (CustomerID=3), Orders: 0
foreach(var result in groupJoined)
{
Console.WriteLine($"Customer: {result.customer.Name} (CustomerID={result.customer.ID}), Orders: {result.order.Count()}");
}
//Flatten the results to look more like a LINQ join
//Produces an enumerable of { Customer, Order }
//All customers represented, order is null if customer has no orders
var flattened = groupJoined.SelectMany(z => z.order.DefaultIfEmpty().Select(y => new { z.customer, y }));
//Get only results where the outer table is null.
//roughly equivalent to:
//SELECT *
//FROM A
//LEFT JOIN B
//ON A.ID = B.ID
//WHERE B.ID IS NULL;
var noMatch = groupJoined.Where(z => z.order.DefaultIfEmpty().Count() == 0);
}
}
class Customer
{
public int ID { get; set; }
public string Name { get; set; }
public Customer(int iD, string name)
{
ID = iD;
Name = name;
}
}
class Order
{
static Random Random { get; set; } = new Random();
public int ID { get; set; }
public int CustomerID { get; set; }
public decimal Amount { get; set; }
public Order(int iD, int customerID)
{
ID = iD;
CustomerID = customerID;
Amount = (decimal)Random.Next(1000, 10000) / 100;
}
}