14

I am trying to use Headless feature of the Chrome to convert a html to pdf. However, i am not getting output at all. Console doesn't show any error as well. I am running below commands in my windows m/c.

chrome --headless --disable-gpu --print-to-pdf

I tried all the various options. Nothing is being generated. I am having chrome version 60

user2580925
  • 811
  • 2
  • 8
  • 14

8 Answers8

18

Command Line --print-to-pdf

By default, --print-to-pdf attempts to create a PDF in the User Directory. By default, that user directory is where the actual chrome binary is stored, which is the specific version folder for the version you're running - for example, "C:\Program Files (x86)\Google\Chrome\Application\61.0.3163.100". And, by default... Chrome is not allowed to write to this folder. You can watch it try, and fail, by adding --enable-logging to your command.

So unfortunately, by default, this command fails.*

You can solve this by either providing a path in the argument, where Chrome can write - like

--print-to-pdf="C:\Users\Jane\test.pdf"

Or, you can change the User Directory:

--user-data-dir="C:\Users\Jane"

One reason you might prefer to change the User Directory is if you want the PDF to automatically receive its name from the webpage; Chrome looks at the title tag and then dumps it like <title>My Page</title> => My-Page.pdf

*I think this default behavior is super confusing, and should be filed as a bug against Chrome. However, apparently part of the Chrome team is outright opposed to the mere existence of this command line option, and instead believe it would be better to force everyone using it to get a node.js build going with Puppeteer and the flag removed outright.

Limitations of Command Line on Windows

Invoking chrome in this way will work fine for example in a local dev env on IIS Express with Visual Studio, but it will fail, even in headless mode, on a server running IIS, because IIS users are not given interactive/desktop permissions, and the way chrome grabs this PDF actually requires interactive/desktop permissions. There are complicated ways to provide those permissions, but anyplace you read up on how begins with DON'T PROVIDE INTERACTIVE/DESKTOP PERMISSIONS. Further, the above risk of Chrome one day getting rid of the command-line makes working even harder to get it working an iffy proposition.

Alternatives to chrome command line

wkhtmltopdf

Judging by the source code, the Chrome team either used or based its work off of wkhtmltopdf. I haven't tried it but it's likely this will get the job done. The one minor risk is that when producing PDFs in Chrome, testing is obvious: View the page in Chrome. Open Print Preview if you're nervous. In wkhtmltopdf, it's actually a different build of Chromium, and that may produce rendering differences. Maybe. As a Community user noted, wkhtmltopdf was archived by the owner on Jan 2, 2023.

Selenium

Another alternative is to get ahead of the group looking to get rid of --print-to-pdf and use the browser dev API (via Selenium) as they prefer.**

private static void pdfSeleniumImpl(string url, string pdfPath)
{
    var options = new OpenQA.Selenium.Chrome.ChromeOptions();
    options.AddArgument("headless");

    using (var chrome = new OpenQA.Selenium.Chrome.ChromeDriver(options))
    {
        chrome.Url = url;

        var printToPdfOpts = new Dictionary<string, object>();
        var resultDict = (Dictionary<string, object>)
            chrome.ExecuteChromeCommandWithResult(
                "Page.printToPDF", printToPdfOpts);
        dynamic result = new DDict(resultDict);
        string data = result.data;
        var pdfFile = Convert.FromBase64String(data);
        System.IO.File.WriteAllBytes(pdfPath, pdfFile);
    }
}

The DDict above is the GracefulDynamicDictionary from another of my answers.

https://www.nuget.org/packages/GracefulDynamicDictionary/

https://github.com/b9chris/GracefulDynamicDictionary

https://stackoverflow.com/a/24192518/176877

Ideally this would be async, since all the calls to Selenium are actually network commands, and writing that file could take a lot of Disk IO. The data returned from Chrome is actually a Stream as well. However Selenium's conventionally used library does not use async at all unfortunately, so it would take upgrading that library or identifying a solid async Selenium library for .Net to really do this right.

Limitations to any Chrome-based approach

Any approach here that uses Chrome on a server, including Selenium, is going to have to deal with Chrome auto-updating, and the Selenium drivers needing to be updated as well as part of your build. Rarely updated code without a strategy to cope with this will break every ~3 months.

https://github.com/puppeteer/puppeteer/blob/master/lib/Page.js#L1007

https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF

**The Page.pdf chrome Dev API command is also deprecated, so if that contingent gets their way, neither the command line nor the Dev API will work. That said it looks like those lobbying to wreck it gave up 2 years ago.

Chris Moschini
  • 36,764
  • 19
  • 160
  • 190
  • Don't understand why full path is required for files in current directory - eg: `C:\Users\User\Documents\XstReader>"C:\Program Files (x86)\Google\Chrome\Application\chrome" --headless --disable-gpu --print-to-pdf=C:\Users\User\Documents\XstReader\DemoEmail.pdf --no-margins "C:\Users\User\Documents\XstReader\Demo Email.html"` – flywire Dec 04 '19 at 04:45
  • Because Chrome ignores the current directory, instead using the user data dir. – Chris Moschini Dec 05 '19 at 17:22
  • this is a great answer, should get at least 100 votes. Especially the Selenium Function. Thanks sm @ChrisMoschini – Jeanno Apr 10 '20 at 20:56
  • 4
    "*Behind the scenes Chrome simply uses wkhtmltopdf.*" - citation needed. – Bergi Sep 07 '20 at 17:03
  • 1
    Many thanks for the Selenium tip! Please note that the "headless" option is important, as otherwise by default the necessary extension will be disabled in Chrome, causing a "PrintToPDF is not implemented" error. (Alternatively, it can presumably be explicitly enabled for non-headless mode, but I have not tried that.) – Otto G Sep 19 '20 at 13:48
  • @Bergi when I was digging into this long ago, wkhtmltopdf was referenced in the comments of the code. But, I have not looked at this code in 4 years - the PDF generator has worked as intended in that time – Chris Moschini Apr 18 '23 at 21:27
8

This is working:

chrome --headless --disable-gpu --print-to-pdf=file1.pdf https://www.google.co.in/

creates file in the folder: C:\Program Files (x86)\Google\Chrome\Application\61.0.3163.100.

0xdb
  • 3,539
  • 1
  • 21
  • 37
SURAJ
  • 103
  • 6
3

I was missing "=" after print-to-pdf command.

The correct command is:

chrome --headless --disable-gpu --print-to-pdf="C:/temp/name.pdf" https://www.google.com/

Now it is working.

Moshe Slavin
  • 5,127
  • 5
  • 23
  • 38
user2580925
  • 811
  • 2
  • 8
  • 14
  • Don't use answers as comments – N-ate Aug 08 '18 at 16:28
  • 1
    This is the correct answer, it needs the full path of the file in the print to pdf, or does not work on windows as of 5/27/19 – Max May 27 '19 at 22:30
  • When you specify a full path, it fails. Filename only it works. To have the effect of specifying fullpath, you need to split the path and set working directory to your destination path – TheRealChx101 Mar 19 '23 at 15:29
2

extending the brilliantly simple answer by suraj, I created a small function that is in my sourced path so it works like a CLI tool:

function webtopdf(){
    chromium-browser --headless --disable-gpu --print-to-pdf=$2 $1
}

so a quick

webtopdf https://goo.com/some-article some-article.pdf

does the job for me now

pascalwhoop
  • 2,984
  • 3
  • 26
  • 40
2

Do not forget to open your terminal/cmd with admin rights :) Otherwise it will just not save the file at all.

Dobromir Hristov
  • 1,221
  • 10
  • 16
  • 1
    Just work from directory that does not require admin rights (for example Chrome installation dir may be resctricted) – Paul Verest Apr 18 '23 at 09:42
2

This worked for me in windows

start chrome --headless --disable-gpu --print-to-pdf=C:\Users\username\pdfs\chrome.pdf --no-margins https://www.google.com

Vipin CP
  • 3,642
  • 3
  • 33
  • 55
  • 2
    You can use in Powershell (and in GitBash) `--print-to-pdf="$(pwd)\output.pdf"` to print in the current folder. For me `--no-margins` have no effect. – Kpym Jan 25 '20 at 11:36
0

For Windows Users (and others with MSEdge) a similar function is provided by MSEdge --headless, In addition version III+ has "With Acrobat" render.

NOTE Google Chromium updated headless to --headless=new and --headless=old using different --switches !! =new --no-pdf-header-footer or =old --print-to-pdf-no-header

NOTE still as at Version 112 Edge does NOT respect headless=new.
Newer --switches can be found at https://peter.sh/experiments/chromium-command-line-switches/

Currently MSEdge uses --headless commands as if --headless=old thus still uses the older -header syntax, --headless --print-to-pdf-no-header will also not write the footer.

There is NO NEED to set a profile but you can via

"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" --profile-directory=c:\whateverUneed --headless blah blah

There should be no need to use any GPU hotfix those were resolved in Windows 5 years ago

So the normal everyday command can be where CWD is any path to a Current Working Directory

"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" --headless=old  --print-to-pdf-no-header --print-to-pdf="c:\CWD\google.pdf" "https://google.com"

enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36
-3

Currently, this is only available for Linux and Mac OS.