1

I'm trying to convert a html document to pdf. I tried multiple tools like iTextSharp, OpenHtmlToPdf, etc. But the Output File doesnt contain the text of the html.

Input File: https://wetransfer.com/downloads/49dbb404cf25f36dc5d1cbcfe0e1491820210523120756/47bb00

Output File: https://wetransfer.com/downloads/7e44ec94f42eb5a6bb9e4d2d986a820d20210523120732/0f3799 Can someone please help me? I'm trying to do this since a week and I havent found a solution that works.

I tried something like this:

using System.IO;
using System;
using NReco.PdfGenerator;

namespace test
{
    class te
    {
        static void Main(string[] args)
        {
            var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
            htmlToPdf.GeneratePdfFromFile(@"C:/Temp/input.html", null, @"C:/Temp/export.pdf");
        }
    }
}
using System.IO;
using System;
using OpenHtmlToPdf;

namespace test
{
    class te
    {
        static void Main(string[] args)
        {
            string html = File.ReadAllText(@"C:/Temp/input.html");
            var pdf = Pdf.From(html);
            byte[] content = pdf.Content();
            File.WriteAllBytes(@"C:/Temp/Test.pdf", content);
        }
    }
}
Janik313
  • 19
  • 3
  • Kindly share the code that you have tried so far. – G K May 23 '21 at 13:06
  • @GK I edited the Question and added some examples of what I tried. I couldn't find more than two, but the code of the other ones was kinda similar. – Janik313 May 24 '21 at 10:15
  • 1
    I don't know if you are willing to try other tools to convert to PDF - https://stackoverflow.com/questions/564650/convert-html-to-pdf-in-net/57810379#57810379 – Mauricio Gracia Gutierrez May 24 '21 at 15:24
  • I can not access the files you are sharing, I suggest that you put them on a google drive public folder or find another hosting of files. Have you try catching exceptions of the conversion process ? – Mauricio Gracia Gutierrez May 24 '21 at 18:09

1 Answers1

1

It is not a programmable, cross platform solution as such. However most Browsers allow to save as PDF, and Edge can even be scripted to print headless to MS Print as PDF.

If it takes a week, I would not have looked for a complex solution, Just use/borrow the nearest Win 10 PC and simply click any objects rendered in Edge (Chrome Sika output based) to Save as PDF.

The results are the best I have ever seen, compared with hundreds from converter programs output. However I will have to concede NOT EVERY visual text object is selectable as illustrated here. Those that are not, are the graphic objects such as alpha and pi which are imbedded within the images. enter image description here

If you wish to automate the task you can use script containing constructs such as

I will spare you the long string needed for your sample but 1st

curl -o local.htm remote.html
RUNDLL32.EXE MSHTML.DLL,PrintHTML "local.htm"

that allows you to select PDF driver and tweak manual output such as page size.

For unattended batch usage you can specify your preferred virtual/network printer such as in my case "My MSPDF printer" for overly complicated examples see https://www.robvanderwoude.com/printfiles.php

The result is just as good using one line

:: Actual print command
START RUNDLL32.EXE MSHTML.DLL,PrintHTML %File2Print% %Printer%

enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36
  • 1
    If this was a user support site of "how to convert HTML to PDF" I could agree with that approach, but this is a developer site, the expected answer should contain code not manual process involved – Mauricio Gracia Gutierrez May 24 '21 at 15:19
  • again this is a developer forum, not a "how do I do this as an end user" there are plenty of good libraries that dont require a headless browser. anyhow I'm not comparing them, I'm just offering other approaches. – Mauricio Gracia Gutierrez May 24 '21 at 17:57