Clustering is basically spawning multiple instances of the same process, so that you can have process based parallelism.
Here's what it is: Say you spawn 4 instances of your app and send 1000 requests to it.
First instance will take the first request and process it.
Second instance will take the second request and process it.
Third instance will take the third request and process it.
Fourth instance will take fourth request and process it.
The instance that first completes processing its request will process the 5th request
and so on.
What's wrong with process based parallelism is this:
Let's say you get 4 requests and you know that they will take 5 minutes, 40 seconds, 30 seconds, 45 seconds respectively to process. They all arrive at time t_0.
You distribute each request to an instance.
At t_0 + 30 seconds, first instance will finish processing its respective request.
At t_0 + 40 seconds, the second instance will finish processing its respective request.
At t_0 + 45 seconds, the third instance will finish processing its respective request.
Starting from t_0 + 45 seconds, ending at t_0 + 300 seconds, all instances but one will wait idly and do a whole lot of nothing while the first instance is still processing that 300-second-long request.