Horizontal scaling and GraphQL within a Node.js environment

Question

I am trying to build an application that contains an instant messaging module, and one of the main challenges is to keep the application scalable whatever the number of the users or the messages that are exchanged is.

In an article I read that it is possible to build real time applications using GraphQL with “subscriptions”, and in addition to that, it is a simple to use protocole and has the advantage of minimizing roundtrip objects retrievals, and hence less resources use.

But what if we need to add a new server/node to the system in order to scale horizontally? Is this possible using GraphQL?

Taking an example of websockets implementation that allows horizontal scaling, there is SocketCluster. I wonder if an application that is developed by GraphQL alone can be scalable across multiple nodes/machines or it must be used with another framework like SocketCluster in order to achieve this end.

Marcin Natanek · Accepted Answer · 2018-05-03T16:32:17.210

Shortly - yes. We have done it, and it works pretty well.

The trick is, you have to think deeper than just an API worker applications when it comes to horizontal scaling. If you want push architecture, it needs to be asynchronous from the very beginning.

To achieve it, we used queueing systems, namely RabbitMQ.

Imagine this scenario of report generation, which can take up to 10 minutes:

Client connects to our GraphQL API (instance 1) via WebSocket
Client sends a command to generate a report via WebSocket
API generates token for the command and puts the command to generate a report in CommandQueue (in RabbitMQ), returning the token to Client.
Client subscribes to events of its command result, using the token
Some backend Worker picks up the command and executes the report generation procedure
During this time GraphQL API (instance 1) dies
Client automatically reconnects to GraphQL API (instance 2)
Client renews the subscription with the previously acquired token
The Worker is done, results on the EventsQueue (RabbitMQ)
ALL of our GraphQL instances receive information on the ReportGenerationDoneEvent and check if anybody is listening for its token.
GraphQL API (instance 2) sees that Client is awaiting results. Pushes the results via websockets.
GraphQL API (instances 3-100) ignore the ReportGenerationDoneEvent.

It is quite a bit extensive, but with simple abstractions, you do not have to think about all this complexity and write ~30 lines of code across several services for a new process using this route.

And what is brilliant about it, you end up with nice horizontal scaling, event replayability (retries), separation of concerns (client, api, workers), push out the data as quickly as possible to the client, and as you mentioned you do not waste bandwidth on the are we done yet? requests.

Another cool thing is, that whenever the user opens reports list within our panel, he sees currently generating reports, and can subscribe to their changes, so they do not have to refresh the list manually.

Good thinking on the SocketCluster. It would optimize step 10 in above scenario, but for now, we do not see any performance issues with broadcasting the ReportGenerationDoneEvent to the whole API cluster. With more instances or multi-region architecture, it would be a must, as it would allow for better scaling and sharding.

It is important to understand that SocketCluster operates on the layer of communication (WebSockets), but the logical API layer (GraphQL) is above that. To make a GraphQL Subscription, you just have to use a communication protocol that allows you to push information to the user, and WebSockets allow that.

I think using SocketCluster is a good design choice, but remember to iterate with implementation. Only use SocketCluster when you plan to have many sockets open at any single point in time. Also, you should subscribe only when necessary, because WebSocket is stateful and requires management and heartbeats.

If you are further interested in asynchronous backend architecture I used above, read up on CQRS and Event Sourcing patterns.

Thank you for your reply. Just to understand well: When you say using GraphQL API via Websocket, you mean that you initiate a socket connection by Websocket and then handle the data exchange with GraphQL? I am still new to these technologies, so please correct me if I am wrong: aren't GraphQL subscriptions and WebSocket connections equivalent? What is the benefit of using GraphQL with SocketCluster if it isn't for scaling and sharding? — Strider, May 03 '18 at 01:53
When you configure your project to use WebSockets, you theoretically could ditch all HTTP GraphQL requests and just use WebSockets for communication. To make a GraphQL Subscription, you just have to use a communication protocol that allows you to push information to the user, and WebSockets allow that. So they are not equivalent, they operate on different levels - GQL Subscription is on the higher level logic layer, and it uses WebSocket for transporting the data. Additionally, using SocketCluster would mainly give you what you just said - scaling and sharding. — Marcin Natanek, May 03 '18 at 10:45
On a combined REST and Websockets design (cf. question https://stackoverflow.com/a/43971625/5486116), many recommand to implement a solution where we subscribe to changes only if we need to. I think that this solution consumes less ressources and gives a better performance...what if we want to apply the same logic while combining Websockets with GraphQL: We communicate mainly via Websocket (SocketCluster), and for any data exchange we use GraphQL, and of course SocketCluster will mainly be responsible for the scalability and sharding part. Is this a good design? — Strider, May 03 '18 at 15:21
I think it is a good design choice, but remember to iterate with implementation. Only use SocketCluster when you plan to have many sockets open at any single point in time. Also yes, you should subscribe only when necessary, because WebSocket is stateful and requires management and heartbeats. — Marcin Natanek, May 03 '18 at 15:53
Thanks again for your guidance. What do you think about adding a summary to your answer so more people can benefit of what you told me in your comments? — Strider, May 03 '18 at 16:21

Horizontal scaling and GraphQL within a Node.js environment

1 Answers1