25

I am using selenium to run chrome headless with the following command:

system "LC_ALL=C google-chrome --headless --enable-logging --hide-scrollbars --remote-debugging-port=#{debug_port} --remote-debugging-address=0.0.0.0 --disable-gpu --no-sandbox --ignore-certificate-errors &"

However it appears that chrome headless is consuming too much memory and cpu,anyone know how we can limit CPU/Memory usage of chrome headless? Or if there is some workaround.

Thanks in advance.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Ahmad Hijazi
  • 635
  • 2
  • 9
  • 27
  • How are you measuring `too much memory and cpu`? Does your _usecase_ have a specification about _memory and cpu_ usage? – undetected Selenium Jun 05 '18 at 14:49
  • @DebanjanB yes, whenever users start using chrome headless memory and cpu become very high. – Ahmad Hijazi Jun 05 '18 at 15:16
  • I too have large memory usage with running selenium tests with google-chrome-headless. Averaging around 36GB of memory from my 1501 tests. That's right GB! I'm running chromedriver 2.42.591071 with chrome 69 under Debian Jessie. I run my tests parallel using 32 threads. – map7 Oct 02 '18 at 04:53
  • In my case I managed to get the memory usage down from 36GB to 14GB because I was using deferred garbage collection within rspec. I turned that off and this is what saved a lot of usage. – map7 Oct 08 '18 at 00:59

2 Answers2

48

There had been a lot of discussion going around about the unpredictable CPU and Memory Consumption by Chrome Headless sessions.

As per the discussion Building headless for minimum cpu+mem usage the CPU + Memory usage can be optimized by:

  • Using either a custom proxy or C++ ProtocolHandlers you could return stub 1x1 pixel images or even block them entirely.

  • Chromium Team is working on adding a programmatic control over when frames are produced. Currently headless chrome is still trying to render at 60 fps which is rather wasteful. Many pages do need a few frames (maybe 10-20 fps) to render properly (due to usage of requestAnimationFrame and animation triggers) but we expect there are a lot of CPU savings to be had here.

  • MemoryInfra should help you determine which component is the biggest consumer of memory in your setup.

  • An usage can be:

      $ headless_shell --remote-debugging-port=9222 --trace-startup=*,disabled-by-default-memory-infra http://www.chromium.org
    
  • Chromium is always going to use as much resources as are available to it. If you want to effectively limit it's utilization, you should look into using cgroups


Having said the above mentioned points here are some of the common best practices to adapt when running headless browsers in a production environment:

resource-usage

Fig: Volatile resource usage of Headless Chrome</sup

  • Don't run a headless browser:

By all accounts, if at all possible, just don't run a headless browser. Headless browsers are un-predictable and hungry. Almost everything you can do with a browser (save for interpolating and running JavaScript) can be done with simple Linux tools. There are libraries those offer elegant Node API's for fetching data via HTTP requests and scraping if that's your end-goal.

  • Don't run a headless browser when you don't need to:

There are users those attempt to keep the browser open, even when not in use, so that it's always available for connections. While this might be a good strategy to help expedite session launch it'll only end in misery after a few hours. This is largely because browsers like to cache stuff and slowly eat more memory. Any time you're not actively using the browser, close it!

  • Parallelize with browsers, not pages:

We should only run one when absolutely necessary, the next best-practice is to run only one session through each browser. While you actually might save some overhead by parallelizing work through pages, if one page crashes it can bring down the entire browser with it. That, plus each page isn't guaranteed to be totally clean (cookies and storage might bleed-through).

  • page.waitForNavigation:

One of the most common issues observed are the actions that trigger a pageload, and the sudden loss of your scripts execution. This is because actions that trigger a pageload can often cause subsequent work to get swallowed. In order to get around this issue, you will generally have to invoke the page-loading-action and immediately wait for the next pageload.

  • Use docker to contain it all:

Chrome takes a lot of dependencies to get running properly. Even after all of that's complete then there's things like fonts and phantom-processes you have to worry about so it's ideal to use some sort of container to contain it. Docker is almost custom-built for this task as you can limit the amount resources available and sandbox it. Create your own Dockerfile yourself.

And to avoid running into zombie processes (which commonly happen with Chrome), you'll want to use something like dumb-init to properly start-up.

  • Two different runtimes:

There can be two JavaScript runtimes going on (Node and the browser). This is great for the purposes of shareability, but it comes at the cost of confusion since some page methods will require you to explicitly pass in references (versus doing so with closures or hoisting).

As an example, while using page.evaluate deep down in the bowels of the protocol, this literally stringifies the function and passes it into Chrome, so things like closures and hoisting won't work at all. If you need to pass some references or values into an evaluate call, simply append them as arguments which get properly handled.

Reference: Observations running 2 million headless sessions

guneysus
  • 6,203
  • 2
  • 45
  • 47
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Hello sir. So I can't rely running chromedriver in windows in headless mode in production environment. Can I trust running "normal" mode? I got all my memory full running 3 instances of chromedriver (8gb) for 1 hour in headless mode. Do you know if when I run chromedriver through web server (all calls became "windowless") it runs in normal or headless mode? If I need to run like 50 instances of chromedriver at the same time, how do you recommend I do it? – Tatiana Perere Jul 17 '20 at 22:50
  • @TatianaPerere Of coarse you can rely running chromedriver in windows in headless mod. I just provided some insights. – undetected Selenium Jul 17 '20 at 22:52
  • How can I, if it crashed my computer due modified memory overflow running 3 chromedriver instances? I need to run much more instances. Do you think I can achieve better results with docker? – Tatiana Perere Jul 18 '20 at 01:39
  • 1
    Never cropped into my mind why I had to continue to use Selenium and Chromium. If I could upvote this a million times I would! – NemyaNation Jan 22 '21 at 07:18
0

Consider to use Docker. It has well documented features for thresholding usage of system resources like memory and cpu. The good news is that it's pretty easy to build a Docker image with headless Chromes (on top of X11) inside it.

There are lots of out of box solutions on that, check it out: https://hub.docker.com/r/justinribeiro/chrome-headless/

Beastmaster
  • 376
  • 2
  • 10
  • 2
    You don't need to use docker just to control memory and CPU. This features comes for free with `cgroups`, that are now default everywhere, and you have easy knobs in systemd to use them, per service, or for a specific run of an application. – Patrick Mevzek Feb 22 '21 at 21:50