1

With EF Core5 and Microsoft.EntityFrameworkCore.Proxies, I'm seeing lots of N+1 queries when it comes to nested objects.

edit To be clear, this question isn't even about N+1 queries, as the accepted answer says. And, as I mention in the comments, GraphQL precludes the developer from using Include() to shape the query because the query shape is left to the user. Traditional GraphQL optimizations, such as implemented by Apollo, use query aggregation in combination with lazy loading to solve the problem I'm describing in this question.

For example, consider an API call to GET /machines/, which is meant to return a list of machines. Each machine has Parts, and and each Part has Settings. The API call is intended to return all of these nested objects in the same response.

To keep it simple, here's the controller endpoint:

public async Task<IActionResult> List() 
{
    MachineProfile[] machineProfiles = await Db<MachineProfile>().Select(x => x).ToArrayAsync();
    Dictionary<string, object>[] res = machineProfiles.Select(ToJson).ToArray();
    return ApiResult(res);
}

public Dictionary<string, object> ToJson(MachineProfile m) 
{
    return new Dictionary<string, object>() {
        [nameof(m.Id)] = m.Id,
        [nameof(m.Parts)] = m.Parts.Select(ToJson).ToArray(),
    };
}

public Dictionary<string, object> ToJson(MachinePart m) 
{
    return new Dictionary<string, object>() {
        [nameof(m.Id)] = m.Id,
        [nameof(m.Settings)] = m.Settings.Select(ToJson).ToArray(),
    };
}

public Dictionary<string, object> ToJson(MachineSetting m) 
{
    return new Dictionary<string, object>() {
        [nameof(m.Id)] = m.Id,
    };
}

And here's what the JSON result looks like.

The number of SQL queries executed are:

  • 1x for MachineProfiles
  • 2x for MachineParts (one for each machine)
  • 11x for MachineSettings (one for each part, even if the same part appears twice)

This is a classic example of an horribly-optimized N+1 query.

Does there exist any package or well-known solution to avoid this?

There exist GraphQL solutions to this problem, which inject thread blocks into the query tree traversal. This is similar to how Microsoft.EntityFrameworkCore.Proxies works, except that it defers and batches the queries to solve the N+1 problem. I'm wondering if it's worth porting this code to EF Core, or if there's an existing solution.

--

Edit1 I tried using the GraphQL package because I plan on building my server as GraphQL anyways. I do see that it actually improves the query count.

For example, if I use this query:

{ getMachineProfiles { id parts { id settings { id } } } }

It produces the same JSON result, but improves performance slightly by appearing to cache the queries internal to the call. In other words, the same setting appearing twice in the same response will not cause a second database call.

But 11 database queries is still WAY more than "should" be used (an optimized version of this API call would only require 3 database queries).

Zane Claes
  • 14,732
  • 15
  • 74
  • 131
  • No. If you refer to the accepted answer, the question wasn't actually even about N+1 queries. – Zane Claes Oct 14 '20 at 21:28
  • Using `Include()` to select multiple tables (basically a `JOIN`) doesn't work? – Progman Oct 14 '20 at 21:30
  • Please read the question again. This is explicitly about traversing a lazy-loaded GraphQL query. `Include()` is a complete non-starter. You seem to not understand the database issues being described. – Zane Claes Oct 14 '20 at 21:32
  • Why did you implement lazy loading? Anyway, your question is too broad because you show two quite different APIs and this paragraph "I have previously written code..." doesn't seem essential. Or if it is, it's too vague. In other words, stick to the actual question. – Gert Arnold Oct 14 '20 at 21:36
  • Because GraphQL, by definition, precludes foreknowledge of the query structure. The fields returned are up to the user, not to the developer. But THAT is an irrelevant detail. The question is about avoiding N+1 queries with lazy loading. My editorialism about having done it before is meant to demonstrate that I am certain it can be done, lest anyone claim otherwise. – Zane Claes Oct 14 '20 at 21:38
  • And what "two different APIs" are you referring to, @Gert – Zane Claes Oct 14 '20 at 21:47
  • How would you even avoid another query, if the idea is to "lazy load?" By definition, wouldn't you want to wait to execute the follow-on query? – Robert Harvey Oct 14 '20 at 21:54
  • @RobertHarvey First, I did NOT say thay I wanted to "avoid another query." Rather, the optimized form of this API call involves 3 queries total: one for each data type. If you're not familiar with N+1 queries, please read about them. This is a textbook case. And I explained in detail how this is achieved in a well-formed GraphQL server already. Namely, you use proxy objects to block threads and defer the queries until they can be aggregated. Like I said, Apollo does it, we did it at my last job in Ruby on Rails, etc. – Zane Claes Oct 14 '20 at 21:56
  • You should use `Include` dynamically, based on wheter or not a certain field has been requested by the GraphQL query, so you only do a `JOIN` if the query actually requires this, or avoids it completely if the data is not requested. – Alejandro Oct 14 '20 at 22:23
  • @Alejandro no, you can't, because you don't know if the Include will be necessary until you sufficiently traverse the query tree and examine the results. See my answer below. There's a package that implements the exact solution I described in the OP. – Zane Claes Oct 14 '20 at 22:26
  • 1
    Looks like you've found a solution, so... Good luck. – Robert Harvey Oct 14 '20 at 22:28

1 Answers1

0

Looks like there's a .NET port of Facebook's DataLoader, which implements the solution I described in the OP (query aggregation).

This solution comes with a dependency upon the GraphQL package, but it can be made to optimize a standard RESTful endpoint by using GraphQL under-the-hood with a precompiled query.

Zane Claes
  • 14,732
  • 15
  • 74
  • 131