Should GraphQL DataLoader wrap request to database or wrap requests to service methods?

Question

I have very common GraphQL schema like this (pseudocode):

Post {
  commentsPage(skip: Int, limit: Int) {
    total: Int
    items: [Comment]
  }
}

So to avoid n+1 problem when requesting multiple Post objects I decided to use Facebook's Dataloader.

Since I'm working on Nest.JS 3-tier layered application (Resolver-Service-Repository), I have question:

should I wrap my repository methods with DataLoader or should I wrap my service methods with Dataloder?

Below is example of my service method that returns Comments page (i.e. this method called from commentsPage property resolver). Inside service method I'm using 2 repository methods (#count and #find):

@Injectable()
export class CommentsService {
    constructor(
        private readonly repository: CommentsRepository,
    ) {}

    async getCommentsPage(postId, dataStart, dateEnd, skip, limit): PaginatedComments {
        const counts = await this.repository.getCount(postId, dateStart, dateEnd);
        const itemsDocs = await this.repository.find(postId, dateStart, dateEnd, skip, limit);
        const items = this.mapDbResultToGraphQlType(itemsDocs);
        return new PaginatedComments(total, items)
    }
}

So should I create individual instances of Dataloader for each of repository method (#count, #find etc) or should I just wrap my entire service method with Dataloader (so my commentsPage property resolver will just work with Dataloader not with service)?

Herku · Answer 1 · 2020-06-16T19:32:20.013

Disclaimer: I am not an expert in Nest.js but I have written a good bunch of dataloaders as well as worked with automatically generated dataloaders. I hope I can give a bit of insight nonetheless.

What is the actual problem?

While your question seems to be a relatively simple either or question it is probably much more difficult than that. I think the actual problem is the following: Whether to use the dataloader pattern or not for a specific field needs to be decided on a per field basis. The repository+service pattern on the other hand tries to abstract away this decision by exposing abstract and powerful ways of data access. One way out would be to simply "dataloaderify" every method of your service. Unfortunately in practice this is not really feasable. Let's explore why!

Dataloader is made for key-value-lookups

Dataloader provides a promise cache to reduce dublicated calls to the database. For this cache to work all requests need to be simple key value lookups (e.g. userByIdLoader, postsByUserIdLoader). This quickly becomes not sufficient enough, like in one of your example your request to the repository has a lot of parameters:

this.repository.find(postId, dateStart, dateEnd, skip, limit);

Sure technically you could make { postId, dateStart, dateEnd, skip, limit } your key and then somehow hash the content to generate a unique key.

Writing Dataloader queries is an order of magnitude harder than normal queries

When you implement a dataloader query it now suddenly has to work for a list of the inputs the initial query needed. Here a simple SQL example:

SELECT * FROM user WHERE id = ?
-- Dataloaded
SELECT * FROM user WHERE id IN ?

Okay now the repository example from above:

SELECT * FROM comment WHERE post_id = ? AND date < ? AND date > ? OFFSET ? LIMIT ?
-- Dataloaded
???

I have sometimes written queries that work for two parameters and they already become very difficult problems. This is why most dataloaders are simply load by id lookups. This tread on twitter discusses how a GraphQL API should only expose what can be efficiently queried. If you create service methods with strong filter methods you have the same problem even if your GraphQL API does not expose these filters.

Okay so what is the solution?

The first thing to my understanding that Facebook does is match fields and service methods very closely. You could do the same. This way you can make a decision in the service method if you want to use a dataloader or not. For example I don't use dataloaders in root queries (e.g. { getPosts(filter: { createdBefore: "...", user: 234 }) { .. }) but in subfields of types that appear in lists { getAllPosts { comments { ... } }. The root query is not going to be executed in a loop and is therefore not exposed to the n+1 problem.

Your repository now exposes what can be "efficiently queried" (as in Lee's tweet) like foreign/primary key lookups or filtered find all queries. The service can then wrap for example the key lookups in a dataloader. Often I end up filtering small lists in my business logic. I think this is perfectly fine for small apps but might be problematic when you scale. The GraphQL Relay helpers for JavaScript do something similar when you use the connectionFromArray function. The pagination is not done on the database level and this is probably okay for 90% of connections.

Some sources to consider

GraphQL before GraphQL - Dan Schafer
Dataloader source code walkthrough - Lee Byron
There is another talk from this years GraphQL conf that discusses the data access at FB but I don't think it is uploaded yet. I might come back when it has been published.

Wow, this is a FANTASTIC answer. In answering the original question, you also answered a few other questions I had, and explained it all so clearly. Thank you! — helloitsjoe, Feb 23 '20 at 17:04
I use "creating a hash with multiple parameters and querying for all of them" method like you said. And again like you said, writing their queries can go out of hand. And if you have simple validations or some error handling steps, it really loses its value. I'm still not fully have an answer for what can I do for this type of situations. But you answer has a lot of good points. Thanks! — Onur Önder, Aug 21 '20 at 17:54