I have a function which I am using to try to give iTextSharp some HTML and from it generate a PDF. This function successfully generates the PDF, complete with CSS styling, but it is not running fast enough for our requirements.
I have noted the one area in particular which is taking a long time to execute is the call to XMLParser.Parse
which I have seen taking upwards of 14 seconds to complete for an 8 page document showing little more than a table of data headed by icons. During the execution of this method I have noticed (in the output window) three exceptions being thrown (and presumably caught) by iTextSharp or code which iTextSharp calls into. These exceptions are:
- 'System.Collections.Generic.KeyNotFoundException' in mscorlib.dll
- 'iTextSharp.tool.xml.exceptions.NoDataException' in itextsharp.xmlworker.dll
- 'System.ArgumentException' in mscorlib.dll
The three exceptions repeat (albeit not necessarily in that order) until the Parse method has finished executing.
Whilst I realize I don't need to handle these exceptions myself, I mention them since I am trying to improve the performance of this method and understand that catching exceptions can be an expensive operation. What I am looking for is what the cause of these exceptions being thrown is and if it is dependent upon bad data I am passing in, what data would it be that is bad?
This is the HTML to PDF function as it currently stands. Note I have already tried using XMLWorkerFontProvider.DONTLOOKFORFONTS
, as suggested by the itext_so manual, but not made any performance gains from doing so. I have also noted a NullReferenceException
thrown and swallowed once byPdfWriter.GetInstance
; I wondered if this may also related to the thrown exceptions in the Parse
method.
public static byte[] GeneratePdfFromHtml(string html, Action<PdfWriter, Document> pdfSettings = null, string additionalFooterText = null)
{
var tagProcessor = (DefaultTagProcessorFactory)Tags.GetHtmlTagProcessorFactory();
tagProcessor.RemoveProcessor(HTML.Tag.IMG);
tagProcessor.AddProcessor(HTML.Tag.IMG, new CustomProcessorImageTag());
using (var workStream = new MemoryStream())
using (var document = new Document())
//NOTE: The NullReferenceException is thrown via this call.
using (PdfWriter writer = PdfWriter.GetInstance(document, workStream))
{
PdfEventHelper pdfEventHelper = new PdfEventHelper(additionalFooterText);
writer.PageEvent = pdfEventHelper;
writer.CloseStream = false;
pdfSettings?.Invoke(writer, document);
document.Open();
var xmlWorkerFontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
//Not noticably faster without this font directory registration.
xmlWorkerFontProvider.RegisterDirectory("~/Content/fonts", false);
//TODO: If further performance is needed then this line is the next slowest (.43ms).
var htmlContext = new HtmlPipelineContext(new CssAppliersImpl(xmlWorkerFontProvider));
htmlContext.SetTagFactory(tagProcessor);
Func<string, string> mapPath = HttpContext.Current.Server.MapPath;
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
foreach (var cssFileName
in new[]
{
"bootstrap.min.css",
"Pdf.css",
"Rdp.css"
})
cssResolver.AddCssFile(mapPath($"~/Content/{cssFileName}"), true);
using (var reader = new StringReader(html))
{
new XMLParser(
new XMLWorker(
new CssResolverPipeline(
cssResolver,
new HtmlPipeline(
htmlContext,
new PdfWriterPipeline(
document,
writer))),
true))
//TODO: Speed up this line - this is the slowest line in the method by far.
//NOTE: This throws a series of ArgumentExceptions, NoDataExceptions and KeyNotFoundExceptions.
.Parse(reader);
document.Close();
return workStream.ToArray();
}
}
}