How to get chrome headless output to memory efficiently with C#?

Question

Upon request, my ASP.NET server should convert an HTML file to PDF using a chrome headless instance and return the resulting PDF.

CMD command:

chrome --headless --disable-gpu --print-to-pdf-no-header --print-to-pdf="[pdf-file-path]" --no-margins "[html-file-path]"

The PDF file is not trivial to deal with. The server needs to cleanup the PDF file from the previous request, needs to detect when the new PDF is created, and then read the file into the memory. All this is just too slow.

Is there a better solution to this? Could I get the file directly into memory somehow? Or manage the PDF file better?

Why don't you just use an open-source HTML to PDF library? https://stackoverflow.com/questions/564650/convert-html-to-pdf-in-net — Camilo Terevinto, Sep 25 '21 at 13:16
@CamiloTerevinto thank you for the response. It does seem logical. However, after weeks of dealing with hell, that is PDF libraries, I decided to take this route. Each one has its own problems and I do not want to pay for it yet. — M. Azyoksul, Sep 25 '21 at 13:22

score 1 · Answer 1 · answered Sep 25 '21 at 17:52

1

I would consider several options.

Print output to a PostScript printer.

Then take the PostScript and say use GhostScript to output a PDF.

Probably even better? use the .net pdfSharp library, and then a some code to render HTML based on that library.

Consider this:

https://www.nuget.org/packages/HtmlRenderer.PdfSharp/1.5.1-beta1

answered Sep 25 '21 at 17:52

Albert D. Kallal

42,205
3
34
51

PdfSharp looks like it's dead... – NobleGuy Jun 07 '23 at 08:58

M. Azyoksul · Accepted Answer · 2021-10-02T02:38:03.990

Quit using chrome through the command-line interface and use Chrome web drivers on C# like Selenium or Puppeteer instead. For Selenium, use the following NuGet:

https://www.nuget.org/packages/Selenium.WebDriver/4.0.0-rc2

Then you can print your HTML into PDF using the following code:

// Base 64 encode
var textBytes = Encoding.UTF8.GetBytes(html);
var b64Html = Convert.ToBase64String(textBytes);

// Create driver
var chromeOptions = new ChromeOptions();
chromeOptions.AddArguments(new List<string> { "no-sandbox", "headless", "disable-gpu" });
using var driver = new ChromeDriver(webdriverPath, chromeOptions);
// Little bit magic here. Refer to: https://stackoverflow.com/a/52498445/7279624
driver.Navigate().GoToUrl("data:text/html;base64," + b64Html);

// Print
var printOptions = new Dictionary<string, object> {
    // Docs: https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
    { "paperWidth", 210 / 25.4 },
    { "paperHeight", 297 / 25.4 },
};
var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
var document = Convert.FromBase64String(printOutput["data"] as string);

the thing is not in the "command-line interface". It is in the "headless" mode. Selenium and Puppeteer wrap Chrome in the headless or full mode. So direct Chrome use is a shortcut. — PIoneer_2, Mar 05 '23 at 16:02

How to get chrome headless output to memory efficiently with C#?

2 Answers2