12

I have a Chrome container (deployed using this Dockerfile) that renders pages on request from an App container.

The basic flow is:

  • App sends an http request to Chrome and in response receives a websocket url to use (e.g. ws://chrome.example.com:9222/devtools/browser/13400ef6-648b-4618-8e4c-b5c73db2a122)
  • App then uses that websocket url to communicate further with Chrome, and to receive the rendered page. I am using the puppeteer library to connect to and communicate with the Chrome instance, using puppeteer.connect({ browserWSEndpoint: webSocketUrl });

For a single Chrome container this works really well.

But I'm trying to scale things up to have multiple Chrome containers in a Docker swarm.

The problem is, I think, that the websocket url received by App is specific to the instance running in that particular Chrome container, so when it is used by App (and where there are now multiple Chrome containers), the websocket requests from App will not necessarily be routed to the right Chrome container.

What is the best way of dealing with this?

Bsquare ℬℬ
  • 4,423
  • 11
  • 24
  • 44
drmrbrewer
  • 11,491
  • 21
  • 85
  • 181

1 Answers1

14

You’ve got the basic design correct, but the issue you’re experiencing is with session “stickiness”. However, instead of trying to re-route subsequent requests back to the appropriate machine, we should look for a way to avoid the "pre" request.

The best way to do that is to have your Chrome docker image man-in-the-middle all http “upgrade” requests. This http action is what all WebSocket connections emit prior to changing protocols including the puppeteer library (which is just a WebSocket client under-the-hood). Doing this will also obviate the need for a pre-connect call since the proxying to Chrome will happen on upgrade vs exposing a URL for the app to use. Here's a pretty basic example of doing this with the http-proxy module:

const http = require('http');
const httpProxy = require('http-proxy');

const proxy = new httpProxy.createProxyServer();

http
  .createServer()
  .on('upgrade', async(req, socket, head) => {
      const browser = await puppeteer.launch();
      const target = browser.wsEndpoint();

      proxy.ws(req, socket, head, { target })
  })
  .listen(3000);

There's other benefits with this approach as will: you can limit things like concurrency and even inject scripts to be ran at a later time. Those require a little more though and preparation, but the overall idea remains the same. This also makes load-balancing trivial since there's not need to make routing sticky.

If this is something you're interested in implementing all that works is largely done for you in the browserless repo. It even allows for things like concurrency limitations, session time limitations, and includes a feature-rich IDE. You can find more docs on that project here.

drmrbrewer
  • 11,491
  • 21
  • 85
  • 181
browserless
  • 2,090
  • 16
  • 16
  • That's really helpful, and the proxy approach seems neat. So you have puppeteer both at the App and at the Proxy, with the Proxy using puppeteer to create and manage browser instances, and the App using puppeteer to communicate with Proxy over a single websocket? Is that the approach? One thing that still bothers me is scaling... scaling of chrome instances happens in the same container, i.e. on the same node/server. If you need more processing power you need to redeploy on a bigger server. I'd like the flexibility of scaling across nodes... i.e. chrome processing shared between nodes. – drmrbrewer Mar 04 '18 at 09:26
  • In other words, if I suddenly need more processing power I'd like to be able to spin up a new server and add it to the docker swarm, so that the multiple chrome containers can be spread across more nodes, thereby easing pressure. And when no longer needed, I can simply kill the extra server. Is this possible with the architecture you're using? – drmrbrewer Mar 04 '18 at 09:30
  • 1
    Yes — that’s the approach I’ve taken at browserless.io. The only thing you’ll need at that point is a load-balancer of some kind (not sure if that’s a feature of Docker swarm or not). Kubernetes might also play a larger role in that auto scaling as well. – browserless Mar 04 '18 at 16:17