58

I've been looking at the following article about Headless Chrome:
https://developers.google.com/web/updates/2017/04/headless-chrome

I just upgraded Chrome on Windows 10 to version 60, but when I run either of the following commands from the command line, nothing seems to happen:

chrome --headless --disable-gpu --dump-dom https://www.google.com/
chrome --headless --disable-gpu --print-to-pdf https://www.google.com/

And I'm running all of these commands from the following path (the default installation path for Chrome on Windows):

C:\Program Files (x86)\Google\Chrome\Application\

When I run the commands, something seems to process for a second, but I don't actually see anything. What am I doing wrong?
Thanks.


Edit:

As noted by Mark Rajcok, if you add --enable-logging to the --dump-dom command, it works. Also, the --print-to-pdf command works as well in Chrome 61.0.3163.79, but you'll probably have to specify a different path for the output file in order to have the necessary permissions to save it.

As such, the following two commands worked for me:

"C:\Program Files (x86)\Google\Chrome\Application\chrome" --headless --disable-gpu --enable-logging --dump-dom https://www.google.com/
"C:\Program Files (x86)\Google\Chrome\Application\chrome" --headless --disable-gpu --print-to-pdf=D:\output.pdf https://www.google.com/

I guess the next step is being able to step through the dumped DOM like PhantomJS with DOM selectors and whatnot, but I suppose that's a separate question.


Edit #2:

For what it's worth, I recently came across a Node API for Headless Chrome called Puppeteer (https://github.com/GoogleChrome/puppeteer), which is really easy to use and delivers all the power of Headless Chrome. If you're looking for an easy way to use Headless Chrome, I highly recommend it.

HartleySan
  • 7,404
  • 14
  • 66
  • 119
  • 1
    Just tried this in Chrome 61.0.3163.79, but still doesn't work. – HartleySan Sep 06 '17 at 02:05
  • 1
    I've been having the same problem all evening. It may well be a different problem for you, but in my case it was a question of having the relevant permissions to write a file in the program files directory. Likewise, trying just C:\output.pdf didn't work, however c:\users\username\output.pdf works fine. Likewise, if you change the permissions on the folder '...application/chrome' it works fine just as --print-to-pdf with no further argument. – tim-mccurrach Nov 10 '17 at 00:17
  • 1
    this also works: `--screenshot=C:\Temp\screenshot.png` – vladkras Dec 19 '17 at 21:10

7 Answers7

13

This works for me:

start chrome --enable-logging --headless --disable-gpu --print-to-pdf=c:\misc\output.pdf https://www.google.com/

... but only with "start chrome" and "--enable-logging" and with a path (for the pdf) specified - and if the folder "misc" exists on the c-directory.

Addition: ... the path for the pdf - "c:\misc" above - can of course be replaced with any other folder/dir.

Marrix
  • 315
  • 3
  • 9
10

With Chrome 61.0.3163.79, if I add --enable-logging then --dump-dom produces output:

> "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --enable-logging --headless --disable-gpu --dump-dom https://www.chromestatus.com
<body class="loading" data-path="/features">
<app-drawer-layout fullbleed="">
...
</script>
</body>

If you want to programatically control headless Chrome, here's one way to do it with Python3 and Selenium:

In an Admin cmd window, install Selenium for Python:

C:\Users\Mark> pip install -U selenium

Download ChromeDriver v2.32 and extract it. I put the chromedriver.exe in C:\Users\Mark, which is where I put this headless.py Python script:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument("headless")  # remove this line if you want to see the browser popup
driver = webdriver.Chrome(chrome_options = options)
driver.get('https://www.google.com/')
print(driver.page_source)
driver.quit()  # don't miss this, or chromedriver.exe will keep running!

Run it in a normal cmd window:

C:\Users\Mark> python headless.py
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" ...
...  lots and lots of stuff here ...
...</body></html>
Mark Rajcok
  • 362,217
  • 114
  • 495
  • 492
  • Mark Rajcok, this may work, but I'm not going to do all of that to test it out. I have nothing against your answer, it's just that if Chrome can truly work as a headless browser now, you shouldn't have to go through all of this to get it to work. If it doesn't work immediately without any tweaking / outside programs, then I'll just keep using PhantomJS. Thanks. If other people want to test this solution and upvote you, that's fine. Thank you. – HartleySan Sep 07 '17 at 13:01
  • 2
    @HartleySan, I discovered it works if you add `--enable-logging`. I updated the answer. – Mark Rajcok Sep 09 '17 at 18:34
  • One minor note: To avoid having to place the script and the chromedriver.exe executable in in the same folder, put chromedriver.exe somewhere in your %PATH% ($env:PATH if you use PowerShell). – Tommy Williams Nov 04 '17 at 02:12
9

Current versions (68-70) seem to require --no-sandbox in order to run, without it they do absolutely nothing and hang in the background.

The full commands I use are:

chrome --headless --user-data-dir=tmp --no-sandbox --enable-logging --dump-dom https://www.google.com/ > file.html
chrome --headless --user-data-dir=tmp --no-sandbox --print-to-pdf=whatever.pdf https://www.google.com/

Using --no-sandbox is a pretty bad idea and you should use this only for websites you trust, but sadly it's the only way of making it work at all.

--user-data-dir=... uses the specified directory instead of the default one, which is likely already in use by your regular browser.

However, if you're trying to make a PDF from HTML, then this is fairly useless, since you can't remove header and footer (containing text like file:///...) and the only viable solution is to use Puppeteer.

blade
  • 12,057
  • 7
  • 37
  • 38
  • 2
    Thanks for that info, blade. I honestly feel like Headless Chrome is still pretty much useless, but maybe I just haven't explored it enough yet. Or rather, I still haven't found a viable replacement for PhantomJS. – HartleySan Aug 09 '18 at 09:56
3

You should be good. Check under the Chrome Version directory

C:\Program Files (x86)\Google\Chrome\Application\60.0.3112.78

For the command

chrome --headless --disable-gpu --print-to-pdf https://www.google.com/

C:\Program Files (x86)\Google\Chrome\Application\60.0.3112.78\output.pdf 

Edit: Still execute commands where the chrome executable is, in this instance

 C:\Program Files (x86)\Google\Chrome\Application\
Karl L
  • 260
  • 4
  • 9
  • I have that folder, but there's no chrome executable in it. I get the following error: `'chrome' is not recognized as an internal or external command, operable program or batch file.` – HartleySan Jul 29 '17 at 01:25
  • Yeah, still execute in the context where the chrome executable is in `C:\Program Files (x86)\Google\Chrome\Application` I found the file shows up in `C:\Program Files (x86)\Google\Chrome\Application\60.0.3112.78\ ` Make sense? – Karl L Jul 31 '17 at 16:41
  • 2
    I understand what you're saying now, but when I execute the following command from the following path, I don't have any output.pdf file under `60.0.3112.78` or anywhere: `chrome --headless --disable-gpu --print-to-pdf https://www.google.com/`; C:\Program Files (x86)\Google\Chrome\Application\ – HartleySan Jul 31 '17 at 22:04
  • 3
    I have the same problem, something "happens" but there is no evidence of it – Novaterata Aug 01 '17 at 14:57
2

I know this question is for Windows, but since Google gives this post as the first search result, here's what works on Mac:

Mac OS X

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless --dump-dom 'http://www.google.com'

Note you MUST put the http or it won't work.

Further tips

To indent the html (which is highly desirable in real pages that are bloated), use tidy:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless --dump-dom 'http://www.google.com' | tidy

You can get tidy with:

brew install tidy
Sridhar Sarnobat
  • 25,183
  • 12
  • 93
  • 106
1

If you want to dodge on the problem in general, and just use a service of some kind to do the work for you, I'm the author/founder of browserless which attempts to tackle running headless Chrome in a service-like fashion. Other than that it's pretty tough to keep up with the changes and making sure all the appropriate packages and resources are installed to get Chrome running, but definitely doable.

browserless
  • 2,090
  • 16
  • 16
  • 1
    griffith_joel, to be totally honest, even though I was able to get Headless Chrome working, it was too much effort to actually use for real work, so I ended up just going back to PhantomJS. Anyway, browserless looks cool, and will check it out. Thanks. – HartleySan Nov 11 '17 at 14:52
  • What kind of things are you trying to do? PhantomJS is much easier to get going, for sure, but having it execute anything of substance tends to cause it to crash. – browserless Nov 11 '17 at 17:35
  • Basic scrapping of structure and data from the DOM of sites. Also, yes, it was slow and all that, which was annoying, but it's what worked at the time. – HartleySan Nov 11 '17 at 18:26
0

I solved it by running this (inside chrome.exe directory),

start-process chrome -ArgumentList "--enable-logging --headless --disable-gpu --print-to-pdf=c:\users\output.pdf https://www.google.com/"

you can choose your own path.print-to-pdf=<<custom path>>

Devin Y
  • 137
  • 2
  • 13