1

This is a follow-up to the question here: Include with where clause. That question wants to find all Awake Passengers on Driving Busses

Without the WHERE Clause on Passengers, that is very simple, like so:

var result = Context.Busses.Where(x => x.IsDriving)
    .Include(x => x.Passengers);

Without the WHERE Clause on Passengers, it is also very simple to include Sub-Relationships from Passengers, like so:

var result = Context.Busses.Where(x => x.IsDriving)
    .Include(x => x.Passengers.CarryOns)
    .Include(x => x.Passengers.Luggage);

But that question requires the use of a WHERE Clause on the Navigation Property. The Answer to the prior question works perfect, sans Sub-Relationships:

var result = Context.Busses.Where(x => x.IsDriving)
    .Select(bus => new {bus, Passengers = bus.Passengers.Where(x => x.Awake)})
    .AsEnumerable().Select(x => x.bus).ToList();

But how do you also include the Passenger's CarryOns and Luggage using this method? Passengers is not queryable, so you can't Include at this point. I attempted something like this, but the first portion was just overwritten by the second portion:

var bussesQuery = Context.Busses.Where(x => x.IsDriving)
    .Include(x => x.Passengers.CarryOns)
    .Include(x => x.Passengers.Luggage);
// var check = bussesQuery.ToList(); 
// The Sub-Relationship data is included here, but the Passengers are not filtered.
var result = bussesQuery
    .Select(bus => new {bus, Passengers = bus.Passengers.Where(x => x.Awake)})
    .AsEnumerable().Select(x => x.bus).ToList();
// The Sub-Relationship data is missing, but the Passengers are filtered
Suamere
  • 5,691
  • 2
  • 44
  • 58

2 Answers2

0

By looking on Your previous query I see that You go from the extreme of N+1 to the extreme of single query. You used to have lots of queries, and now You want to have one, but please consider what is going on under the hood. To fetch the data entity framework needs to cross join all entities, so for each included entity you get extra columns in Your results AND result is crossjoined with that include.

Let's say You have 5 driving buses, 30 awake passengers and 15 luggage per bus, as a result You get buses x luggage x passengers = 2250 records, each containing passenger and luggage data`. If you query passengers and luggage with separate queries You will have much less records (5 * 30 + 5 * 15 = 225) and each entity will be fetched once.

Doing one big query that will return everything is not that good idea - it is slower, harder to maintain and not worth Your time. Just do query for awake passengers, and then query for luggage.

Shadow
  • 2,089
  • 2
  • 23
  • 45
  • Sure, if your query eager loads the entire database and you filter in memory. The dream here is to accomplish the same thing I could do with raw SQL, but using LinqToEntities, where the clauses filter within the DB. But you are correct to say eager loading to filter in-memory is bad. I definitely don't want that. – Suamere Sep 01 '18 at 04:16
0

For the answer, scroll down to the answer section.

Disclaimer: I love EF. For 99.999% of calls made in my system, I can write the code (LINQ) the fastest, and the OR-Mapping is the fastest system. Also, the Generated Queries (While confusing to look at) have much faster execution plans than hand-written SQL. But that isn't the case here.

Research Section

To begin with an Aside: The raw SQL to view my final request is something like this:

SELECT * FROM [Busses] [bus]
LEFT JOIN [Passengers] [passenger] ON [passenger].[BusID] = [bus].[BusID] AND [passenger].[Awake] <> 1
    LEFT JOIN [CarryOns] [carryOn] ON [carryOn].[PassengerID] = [passenger].[PassengerID]
    LEFT JOIN [Luggages] [luggage] ON [luggage].[PassengerID] = [passenger].[PassengerID]
WHERE [bus].[IsDriving] = 1

Of course, if EF were to generate something for these results, it would require nesting and key fields to know how to map them. No big deal.

Unfortunately, in order to achieve this with a single hit to the database, I have to do the following:

var busses = context.Set<BusEntity>().Where(x => x.IsDriving);
var passengers = context.Set<PassengerEntity>().Where(x => x.Awake);
var carryOns = context.Set<CarryOnEntity>();
var luggages = context.Set<LuggageEntity>();

var passengerJoins = passengers.GroupJoin(
        carryOns,
        x => x.PassengerID,
        y => y.PassengerID,
        (x, y) => new { Passenger = x, CarryOns = y }
    )
    .SelectMany(
        x => x.CarryOns.DefaultIfEmpty(),
        (x, y) => new { Passenger = x.Passenger, CarryOns = x.CarryOns }
    ).GroupJoin(
        luggages,
        x => x.Passenger.PassengerID,
        y => y.PassengerID,
        (x, y) => new { Passenger = x.Passenger, CarryOns = x.CarryOns, Luggages = y }
    )
    .SelectMany(
        x => x.Luggages.DefaultIfEmpty(),
        (x, y) => new { Passenger = x.Passenger, CarryOns = x.CarryOns, Luggages = x.Luggages }
    );

var bussesToPassengers = busses.GroupJoin(
        passengerJoins,
        x => x.BusID,
        y => y.Passenger.BusID,
        (x, y) => new { Bus = x, Passengers = y }
    )
    .SelectMany(
        x => x.Passengers.DefaultIfEmpty(),
        (x, y) => new { Bus = x.Bus, Passengers = x.Passengers }
    )
    .GroupBy(x => x.Bus);

var rez = bussesToPassengers.ToList()
    .Select(x => x.First().Bus)
    .ToList();

I don't complain about the EF Generated SQL, but the single SQL statement was a couple hundred lines. I hacked at it, removed the SELECT columns, and altered some ID's to match this question, it was something like this:

SELECT *
FROM ( SELECT *
    FROM   (SELECT *
        FROM ( SELECT DISTINCT *
            FROM  [dbo].[Bus] AS [Extent1]
            LEFT OUTER JOIN  (SELECT *
                FROM    [dbo].[Passenger] AS [Extent2]
                LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent3] ON [Extent2].[PassengerId] = [Extent3].[PassengerId]
                LEFT OUTER JOIN [dbo].[Luggages] AS [Extent4] ON [Extent2].[PassengerId] = [Extent4].[PassengerId]
            WHERE [Extent1].[IsDriving] = 1
        )  AS [Distinct1] ) AS [Project2]
    OUTER APPLY  (SELECT *
        FROM   (SELECT *
            FROM  [dbo].[Bus] AS [Extent6]
            LEFT OUTER JOIN  (SELECT *
                FROM    [dbo].[Passenger] AS [Extent7]
                LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent8] ON [Extent7].[PassengerId] = [Extent8].[PassengerId]
                LEFT OUTER JOIN [dbo].[Luggages] AS [Extent9] ON [Extent7].[PassengerId] = [Extent9].[PassengerId]
            WHERE ([Extent6].[IsDriving] = 1) AND ([Project2].[BusId] = [Extent6].[BusId]) ) AS [Project3]
        OUTER APPLY  (SELECT *
            FROM     [dbo].[Passenger] AS [Extent11]
            LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent12] ON [Extent11].[PassengerId] = [Extent12].[PassengerId]
            LEFT OUTER JOIN [dbo].[Luggages] AS [Extent13] ON [Extent11].[PassengerId] = [Extent13].[PassengerId]
            LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent15] ON [Extent11].[PassengerId] = [Extent15].[PassengerId]
            WHERE ([Extent11].[IsAwake] = 1) AND ([Project3].[BusId] = [Extent11].[BusId])
        UNION ALL
            SELECT *
            FROM     [dbo].[Passenger] AS [Extent16]
            LEFT OUTER JOIN [dbo].[CarryOns] AS [Extent17] ON [Extent16].[PassengerId] = [Extent17].[PassengerId]
            LEFT OUTER JOIN [dbo].[Luggages] AS [Extent18] ON [Extent16].[PassengerId] = [Extent18].[PassengerId]
            WHERE ([Extent16].[IsAwake] = 1) AND ([Project3].[BusId] = [Extent16].[BusId])
)  AS [Project7]
ORDER BY ........................

For my personal test data, My Hand-Written SQL Query returns 54 rows, and the EF Generated Query returns about 30,000 rows. So if you only consider the increase in time for the Over-The-Wire transfer of the data, that is not acceptable.

Answer Section

The answer is: You can use Linq to Entities (on DB) and Linq to Objects (in code) to achieve your results in a single call, but it will not be performant. You can instead choose multiple calls with better performance, including less data transferred over the wire, more readable generated queries, and more understandable code.

The best bet is to perform multiple queries. This is the way I am doing it:

var bus = context.Set<BusEntity>().Where(x => x.IsDriving).ToList();
var busIDs = bus.Select(x => x.BusID).ToList();
var passengers = context.Set<PassengerEntity>().Where(x => x.IsAwake && busIDs.Contains(x.BusID)).ToList();
var passengerIDs = passengers.Select(x => x.PassengerID).ToList();
var carryOns = context.Set<CarryOnEntity>().Where(x => passengerIDs.Contains(x.PassengerID)).ToList();
var luggages = context.Set<LuggageEntity>().Where(x => passengerIDs.Contains(x.PassengerID)).ToList();
passengers.ForEach(x => {
    x.CarryOns = carryOns.Where(y => y.PassengerID == x.PassengerID).ToList();
    x.Luggages = luggages.Where(y => y.PassengerID == x.PassengerID).ToList();
});
bus.ForEach(x => x.Passengers = passengers.Where(y => y.BusID == x.BusID).ToList());

This generated 4 calls. Altogether, the SQL had about 40 lines. I hacked at it, removed the SELECT columns, and altered some ID's to match this question, it was something like this:

SELECT * FROM [dbo].[Busses] AS [Extent1]
    WHERE [Extent1].[IsDriving] = 1

SELECT * FROM [dbo].[Passengers] AS [Extent1]
    WHERE ([Extent1].[Awake] = 1) AND ([Extent1].[BusID] IN (......................))

SELECT * FROM [dbo].[CarryOns] AS [Extent1]
    WHERE [Extent1].[PassengerID] IN (......................)

SELECT * FROM [dbo].[Luggages] AS [Extent1]
    WHERE [Extent1].[PassengerID] IN (......................)

The EF Generated Query returns about 100 rows total across the 4 round-trip calls. So that means 4 calls to the database, but all very small, readable, and very quick.

I didn't time it, but whenever I pause on a breakpoint above this answer's code, and F5 to the other side of the result, it is instant. When I do the same thing for the Single-Call in my research, it took a solid second or more, noticeable lag running over that.

Suamere
  • 5,691
  • 2
  • 44
  • 58
  • For this exact scenario, CarryOns and Luggages have no WHERE Clause other than their relationship to Passenger. So those could actually just be `Include()`ed. But for demonstration, especially if you wanted a third, fourth, or further depth to your Clause Control, I kept it built out in my answer. – Suamere Sep 01 '18 at 18:24