For the answer, scroll down to the answer section.
Disclaimer: I love EF. For 99.999% of calls made in my system, I can write the code (LINQ) the fastest, and the OR-Mapping is the fastest system. Also, the Generated Queries (While confusing to look at) have much faster execution plans than hand-written SQL. But that isn't the case here.
Research Section
To begin with an Aside: The raw SQL to view my final request is something like this:
SELECT * FROM [Busses] [bus]
LEFT JOIN [Passengers] [passenger] ON [passenger].[BusID] = [bus].[BusID] AND [passenger].[Awake] <> 1
LEFT JOIN [CarryOns] [carryOn] ON [carryOn].[PassengerID] = [passenger].[PassengerID]
LEFT JOIN [Luggages] [luggage] ON [luggage].[PassengerID] = [passenger].[PassengerID]
WHERE [bus].[IsDriving] = 1
Of course, if EF were to generate something for these results, it would require nesting and key fields to know how to map them. No big deal.
Unfortunately, in order to achieve this with a single hit to the database, I have to do the following:
var busses = context.Set<BusEntity>().Where(x => x.IsDriving);
var passengers = context.Set<PassengerEntity>().Where(x => x.Awake);
var carryOns = context.Set<CarryOnEntity>();
var luggages = context.Set<LuggageEntity>();
var passengerJoins = passengers.GroupJoin(
carryOns,
x => x.PassengerID,
y => y.PassengerID,
(x, y) => new { Passenger = x, CarryOns = y }
)
.SelectMany(
x => x.CarryOns.DefaultIfEmpty(),
(x, y) => new { Passenger = x.Passenger, CarryOns = x.CarryOns }
).GroupJoin(
luggages,
x => x.Passenger.PassengerID,
y => y.PassengerID,
(x, y) => new { Passenger = x.Passenger, CarryOns = x.CarryOns, Luggages = y }
)
.SelectMany(
x => x.Luggages.DefaultIfEmpty(),
(x, y) => new { Passenger = x.Passenger, CarryOns = x.CarryOns, Luggages = x.Luggages }
);
var bussesToPassengers = busses.GroupJoin(
passengerJoins,
x => x.BusID,
y => y.Passenger.BusID,
(x, y) => new { Bus = x, Passengers = y }
)
.SelectMany(
x => x.Passengers.DefaultIfEmpty(),
(x, y) => new { Bus = x.Bus, Passengers = x.Passengers }
)
.GroupBy(x => x.Bus);
var rez = bussesToPassengers.ToList()
.Select(x => x.First().Bus)
.ToList();
I don't complain about the EF Generated SQL, but the single SQL statement was a couple hundred lines. I hacked at it, removed the SELECT columns, and altered some ID's to match this question, it was something like this:
SELECT *
FROM ( SELECT *
FROM (SELECT *
FROM ( SELECT DISTINCT *
FROM [dbo].[Bus] AS [Extent1]
LEFT OUTER JOIN (SELECT *
FROM [dbo].[Passenger] AS [Extent2]
LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent3] ON [Extent2].[PassengerId] = [Extent3].[PassengerId]
LEFT OUTER JOIN [dbo].[Luggages] AS [Extent4] ON [Extent2].[PassengerId] = [Extent4].[PassengerId]
WHERE [Extent1].[IsDriving] = 1
) AS [Distinct1] ) AS [Project2]
OUTER APPLY (SELECT *
FROM (SELECT *
FROM [dbo].[Bus] AS [Extent6]
LEFT OUTER JOIN (SELECT *
FROM [dbo].[Passenger] AS [Extent7]
LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent8] ON [Extent7].[PassengerId] = [Extent8].[PassengerId]
LEFT OUTER JOIN [dbo].[Luggages] AS [Extent9] ON [Extent7].[PassengerId] = [Extent9].[PassengerId]
WHERE ([Extent6].[IsDriving] = 1) AND ([Project2].[BusId] = [Extent6].[BusId]) ) AS [Project3]
OUTER APPLY (SELECT *
FROM [dbo].[Passenger] AS [Extent11]
LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent12] ON [Extent11].[PassengerId] = [Extent12].[PassengerId]
LEFT OUTER JOIN [dbo].[Luggages] AS [Extent13] ON [Extent11].[PassengerId] = [Extent13].[PassengerId]
LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent15] ON [Extent11].[PassengerId] = [Extent15].[PassengerId]
WHERE ([Extent11].[IsAwake] = 1) AND ([Project3].[BusId] = [Extent11].[BusId])
UNION ALL
SELECT *
FROM [dbo].[Passenger] AS [Extent16]
LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent17] ON [Extent16].[PassengerId] = [Extent17].[PassengerId]
LEFT OUTER JOIN [dbo].[Luggages] AS [Extent18] ON [Extent16].[PassengerId] = [Extent18].[PassengerId]
WHERE ([Extent16].[IsAwake] = 1) AND ([Project3].[BusId] = [Extent16].[BusId])
) AS [Project7]
ORDER BY ........................
For my personal test data, My Hand-Written SQL Query returns 54 rows, and the EF Generated Query returns about 30,000 rows. So if you only consider the increase in time for the Over-The-Wire transfer of the data, that is not acceptable.
Answer Section
The answer is: You can use Linq to Entities (on DB) and Linq to Objects (in code) to achieve your results in a single call, but it will not be performant. You can instead choose multiple calls with better performance, including less data transferred over the wire, more readable generated queries, and more understandable code.
The best bet is to perform multiple queries. This is the way I am doing it:
var bus = context.Set<BusEntity>().Where(x => x.IsDriving).ToList();
var busIDs = bus.Select(x => x.BusID).ToList();
var passengers = context.Set<PassengerEntity>().Where(x => x.IsAwake && busIDs.Contains(x.BusID)).ToList();
var passengerIDs = passengers.Select(x => x.PassengerID).ToList();
var carryOns = context.Set<CarryOnEntity>().Where(x => passengerIDs.Contains(x.PassengerID)).ToList();
var luggages = context.Set<LuggageEntity>().Where(x => passengerIDs.Contains(x.PassengerID)).ToList();
passengers.ForEach(x => {
x.CarryOns = carryOns.Where(y => y.PassengerID == x.PassengerID).ToList();
x.Luggages = luggages.Where(y => y.PassengerID == x.PassengerID).ToList();
});
bus.ForEach(x => x.Passengers = passengers.Where(y => y.BusID == x.BusID).ToList());
This generated 4 calls. Altogether, the SQL had about 40 lines. I hacked at it, removed the SELECT columns, and altered some ID's to match this question, it was something like this:
SELECT * FROM [dbo].[Busses] AS [Extent1]
WHERE [Extent1].[IsDriving] = 1
SELECT * FROM [dbo].[Passengers] AS [Extent1]
WHERE ([Extent1].[Awake] = 1) AND ([Extent1].[BusID] IN (......................))
SELECT * FROM [dbo].[CarryOns] AS [Extent1]
WHERE [Extent1].[PassengerID] IN (......................)
SELECT * FROM [dbo].[Luggages] AS [Extent1]
WHERE [Extent1].[PassengerID] IN (......................)
The EF Generated Query returns about 100 rows total across the 4 round-trip calls. So that means 4 calls to the database, but all very small, readable, and very quick.
I didn't time it, but whenever I pause on a breakpoint above this answer's code, and F5 to the other side of the result, it is instant. When I do the same thing for the Single-Call in my research, it took a solid second or more, noticeable lag running over that.