3

I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.

I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway?

So this works:

const screen = await cluster.execute(req.query.url);

But this breaks:

const screen = await cluster.queue(req.query.url);

Here's the full example with queue:

const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');

(async () => {
    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 2,
    });
    await cluster.task(async ({ page, data: url }) => {
        // make a screenshot
        await page.goto('http://' + url);
        const screen = await page.screenshot();
        return screen;
    });

    // setup server
    app.get('/', async function (req, res) {
        if (!req.query.url) {
            return res.end('Please specify url like this: ?url=example.com');
        }
        try {
            const screen = await cluster.queue(req.query.url);

            // respond with image
            res.writeHead(200, {
                'Content-Type': 'image/jpg',
                'Content-Length': screen.length //variable is undefined here
            });
            res.end(screen);
        } catch (err) {
            // catch error
            res.end('Error: ' + err.message);
        }
    });

    app.listen(3000, function () {
        console.log('Screenshot server listening on port 3000.');
    });
})();

What am I doing wrong here? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.

Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105
G_V
  • 2,396
  • 29
  • 44

1 Answers1

11

Author of puppeteer-cluster here.

Quote from the docs:

cluster.queue(..): [...] Be aware that this function only returns a Promise for backward compatibility reasons. This function does not run asynchronously and will immediately return.

cluster.execute(...): [...] Works like Cluster.queue, just that this function returns a Promise which will be resolved after the task is executed. In case an error happens during the execution, this function will reject the Promise with the thrown error. There will be no "taskerror" event fired.

When to use which function:

  • Use cluster.queue if you want to queue a large number of jobs (e.g. list of URLs). The task function needs to take care of storing the results by printing them to console or storing them into a database.
  • Use cluster.execute if your task function returns a result. This will still queue the job, so this is like calling queue in addition to waiting for the job to finish. In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).

So, you definitely want to use cluster.execute as you want to wait for the results of the task function. The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue function are emitted via a taskerror event. The cluster.execute errors are directly thrown (Promise is rejected). Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute

Thomas Dondorf
  • 23,416
  • 6
  • 84
  • 105
  • 2
    Ah I see, thank you! I somehow completely missed that part. Just for clarity, execute puts the specified call in a queuing mechanism where the cluster assigns workers that pick up the tasks when they are available? If that's the case, is setting maxConcurrency (number of workers) equal to the available cores a bad idea as it could burden a server with 100% cpu usage across all cores? Thank you for your time and puppeteer-cluster, it's saving me weeks of headaches. I can post a new question if needed. – G_V Aug 06 '19 at 07:49
  • 1
    Yes, I'll edit my question later to clarify on that part. `cluster.execute` also queues the job. The only difference is the blocking (and the error handling part). Ideally, you wan't to be close to 100% CPU utilization, but it depends on the task you are doing and the CPU you have, how many workers your system can handle. See [this question/answer](https://stackoverflow.com/a/57295869/5627599) for more information. – Thomas Dondorf Aug 06 '19 at 10:59
  • @ThomasDondorf Extremely useful! thank you, I'm migrating from PhantomJS to Puppeteer and was about to make a DEQUE controller until I found your repo – Gabriel Balsa Cantú Oct 05 '20 at 18:05