Convert html to pdf and merge it with existing pdfs

Question

I have a System.Net.Mail.MailMessage which shall have it's html body and pdf attachments converted into one single pdf.

Converting the html body to pdf works for me with this answer

Converting the pdf attachments into one pdf works for me with this answer

However after ~10 hours of trying I can not come up with a combined solution which does both. All I'm getting are NullReferenceExceptions somewhere in IText source, "the document is not open", etc...

For example, this will throw no error but the resulting pdf will only contain the attachments but not the html email body:

Document document = new Document();
StringReader sr = new StringReader(mail.Body);
HTMLWorker htmlparser = new HTMLWorker(document);
using (FileStream fs = new FileStream(targetPath, FileMode.Create))
{
    PdfCopy writer = new PdfCopy(document, fs);
    document.Open();
    htmlparser.Parse(sr);

    foreach (string fileName in pdfList)
    {
        PdfReader reader = new PdfReader(fileName);
        reader.ConsolidateNamedDestinations();
        for (int i = 1; i <= reader.NumberOfPages; i++)
        {
            PdfImportedPage page = writer.GetImportedPage(reader, i);
            writer.AddPage(page);
        }
        PRAcroForm form = reader.AcroForm;
        if (form != null)
        {
            writer.CopyAcroForm(reader);
        }
        reader.Close();
    }
    writer.Close();
    document.Close();
}

I'm using the LGPL licensed ITextSharp 4.1.6

Your code uses `HTMLWorker`, which is rather limited in HTML to PDF conversion, and which is declared obsolete. I recommend that you use `XMLWorker` instead. You also use iTextSharp 4.1.6, which is End-Of-Life since 2012, see http://stackoverflow.com/documentation/itext/3557/getting-started-with-itext#t=201703151411361159498&a=versions. The current version is 5.5.11, see https://www.nuget.org/packages/itextsharp. However that is just a comment on your dependencies, it is not likely to be related to the issue at hand. — Amedee Van Gasse, Mar 23 '17 at 17:27
Can you share a sample MailMessage so any reader can copy and paste your code into their IDE and try it out for themselves? — Amedee Van Gasse, Mar 23 '17 at 17:28
I can not use the new iTextSharp because it does not have LGPL license and the old one does not contain XMLWorker. First tests show that the capabilities of HTMLWorker are enough for my scenario. As described above my problem is that I can not add the html body and the pdf attachments to the same Document. — , Mar 23 '17 at 17:30
Licenses are not a technical but a legal issue, that's something that your manager should take care of. :) About that sample? And anyone trying out your code, will probably do that with the latest version anyway. Just so you know. — Amedee Van Gasse, Mar 23 '17 at 17:33
Well any sample (X)HTML body is enough. How about this already string escaped example: \r\n\r\n\r\n\r\n\r\n Title of document\r\n\r\n\r\n\r\n some content\r\n\r\n\r\n — , Mar 23 '17 at 17:35

COeDev · Accepted Answer · 2017-03-24T13:59:16.170

From v4.1.6 fanboy to v4.1.6 fanboy :D

Looks like the HTMLWorker is closing the documents stream right after parsing. So as a workaround, you could create a pdf from your mailbody in memory. And then add this one together with the attachment to your final pdf.

Here is some code, that should do the trick:

  StringReader htmlStringReader = new StringReader("<html><body>Hello World!!!!!!</body></html>");

  byte[] htmlResult;

  using (MemoryStream htmlStream = new MemoryStream())
  {
    Document htmlDoc = new Document();
    PdfWriter htmlWriter = PdfWriter.GetInstance(htmlDoc, htmlStream);
    htmlDoc.Open();

    HTMLWorker htmlWorker = new HTMLWorker(htmlDoc);
    htmlWorker.Parse(htmlStringReader);

    htmlDoc.Close();
    htmlResult = htmlStream.ToArray();
  }

  byte[] pdfResult;

  using (MemoryStream pdfStream = new MemoryStream())
  {
    Document doc = new Document();
    PdfCopy copyWriter = new PdfCopy(doc, pdfStream);
    doc.Open();

    PdfReader htmlPdfReader = new PdfReader(htmlResult);
    AppendPdf(copyWriter, htmlPdfReader); // your foreach pdf code here
    htmlPdfReader.Close();

    PdfReader attachmentReader = new PdfReader("C:\\temp\\test.pdf");
    AppendPdf(copyWriter, attachmentReader);
    attachmentReader.Close();

    doc.Close();

    pdfResult = pdfStream.ToArray();
  }

  using (FileStream fs = new FileStream("C:\\temp\\test2.pdf", FileMode.Create, FileAccess.Write))
  {
    fs.Write(pdfResult, 0, pdfResult.Length);
  }

private void AppendPdf(PdfCopy writer, PdfReader reader)
{
  for (int i = 1; i <= reader.NumberOfPages; i++)
  {
    PdfImportedPage page = writer.GetImportedPage(reader, i);        
    writer.AddPage(page);
  }
}

Ofc you could directly use a FileStream for the final document instead of a MemoryStream as well.

I would not call myself a fanboy, my company just can't afford to spend hundreds of dollars for a function which is not a major part of our program. As a last resort I did exaclty what you said. I create a .pdf from the mail.body in the Windows Temp folder and just add it to the list for the PdfCopy writer. I did not test your code but will make it the accepted answer, as you seem to know what you're talking about :) — , Mar 24 '17 at 16:17
No offense... that "fanboy" was just a little joke because due to exactly the same reasons I'm stuck with v4.6.1, too :/ — COeDev, Mar 26 '17 at 17:25
No offense taken at all. Didn't mean to come across as harsh. Thanks for your input! — , Mar 27 '17 at 10:26

Convert html to pdf and merge it with existing pdfs

1 Answers1