0

I'm not very experienced with asynchronous programming, so please excuse my ignorance.

I'm trying to generate a list of PDFS asynchronously to improve performance.

However, the code runs the same whether it's asynchronous or synchronous:

Parallel Test MS: 10452
Async Test MS: 9971
Sync Test MS: 10501

Is there anything obvious that I'm doing wrong, or is it the library? I'm using the following docs: https://ironpdf.com/docs/questions/async/

Main:

static async Task Main(string[] args)
        {
            var html = @"<h1>Hello World!</h1><br><p>This is IronPdfss.</p>";
            Stopwatch stopwatch = new Stopwatch();
            List<PdfDocument> pdfDocuments = new List<PdfDocument>();
            List<string> htmlStrings = new List<string>();
            for (int i = 0; i < iterations; i++)
                htmlStrings.Add(html);

            stopwatch.Start();
            Parallel.ForEach(htmlStrings, htmlString =>
            {
                var document = RenderPdf(htmlString);
                pdfDocuments.Add(document);
            });
            stopwatch.Stop();
            Console.WriteLine($"Parallel Test MS: {stopwatch.ElapsedMilliseconds}");

            stopwatch.Restart();
            var tasks = htmlStrings.Select(async h =>
            {
                var response = await RenderPdfAsync(h);
                pdfDocuments.Add(response);
            });
            await Task.WhenAll(tasks);
            stopwatch.Stop();
            Console.WriteLine($"Async Test MS: {stopwatch.ElapsedMilliseconds}");

            stopwatch.Restart();
            foreach (string h in htmlStrings)
            {
                var document = RenderPdf(h);
                pdfDocuments.Add(document);
            }
            stopwatch.Stop();
            Console.WriteLine($"Sync Test MS: {stopwatch.ElapsedMilliseconds}");

            Console.ReadLine();
        }

Helper Methods:

private static async Task<IronPdf.PdfDocument> RenderPdfAsync(string Html, IronPdf.PdfPrintOptions PrintOptions = null)
{
    return await Task.Run(() => RenderPdf(Html, PrintOptions));
}
private static IronPdf.PdfDocument RenderPdf(string Html, IronPdf.PdfPrintOptions PrintOptions = null)
{
    var Renderer = new IronPdf.HtmlToPdf();
    if (PrintOptions != null)
    {
        Renderer.PrintOptions = PrintOptions;
    }
    PdfDocument Pdf = Renderer.RenderHtmlAsPdf(Html);
    return Pdf;
}
Daniel Mann
  • 57,011
  • 13
  • 100
  • 120
DragonMasa
  • 63
  • 1
  • 9
  • 3
    As an aside that doesn't really deal with your performance question, that `Parallel.ForEach` is going to cause race conditions. Your list, `pdfDocuments`, might be written to multiple times simultaneously. – Joshua Robinson Jul 31 '20 at 18:43

4 Answers4

3
private static async Task<IronPdf.PdfDocument> RenderPdfAsync(string Html, IronPdf.PdfPrintOptions PrintOptions = null)
{
    return await Task.Run(() => RenderPdf(Html, PrintOptions));
}

This is what's generally called "fake asynchrony". It's a method with an asynchronous signature that is not really asynchronous. It's just synchronous work run on a thread pool thread. So, the "asynchronous" code would behave very similarly to the parallel code: it runs each render on a thread pool thread.

In this case, the operation is CPU-bound, not I/O-bound, so synchronous or parallel code is the correct approach. E.g., I would think Parallel LINQ is the best approach. You wouldn't want to use asynchronous code here.

What's odd about your timings is that the parallel code is not faster than the synchronous code. One explanation for this is that the PDF rendering is already parallel, so additional parallelism wouldn't help. Another explanation is that something is restricting your application to only running on a single CPU core.

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
2

Take a look here:

        var tasks = htmlStrings.Select(async h =>
        {
            var response = await RenderPdfAsync(h);
            pdfDocuments.Add(response);
        });
        await Task.WhenAll(tasks);

You are awaiting in the Select, so you are just doing one at a time. Try doing something like this:

        var tasks = htmlStrings.Select(h =>
        {
            return RenderPdfAsync(h);
        });
        await Task.WhenAll(tasks);
        foreach(var t in tasks){ pdfDocuments.Add(await t); }

Keep in mind you are already using a proper parallel library above (Parallel.ForEach) and to keep things consistent, you should probably use that pattern here as well.

Andy
  • 12,859
  • 5
  • 41
  • 56
  • Hmm, there doesn't seem to be any difference with that code, Andy. How odd. – DragonMasa Jul 31 '20 at 18:49
  • 3
    This changes nothing about whether the operations run in parallel or not. The two snippets behave identically in either case. The only way in which the two differ is that the second uses the return value of the task, and the first adds to the list directly, as is explained in [this post](https://stackoverflow.com/q/19098143/). The fact that the timing of this method is comparable to a `Parallel.ForEach` call in the OP shows that there is some underlying behavior of the operation that's not parallelizable. – Servy Jul 31 '20 at 18:50
  • 2
    @DragonMasa -- i have used IronPDF before and it's a massive CPU hog. The bottleneck could be with IronPDF/CPU and not parallelization. – Andy Jul 31 '20 at 18:51
  • @Andy Do you have any suggestions for improving performance? – DragonMasa Jul 31 '20 at 18:54
  • 2
    @DragonMasa -- as snarky as this sounds, you could just get a better processor in your computer. Unfortunately, I never made it faster, no matter what tricks I used. I tried many PDF converters out there, I believe I finally settled on `SelectPdf`, but I don't believe the results were much better. They basically all use the same back-end to process HTML in to PDF documents. – Andy Jul 31 '20 at 19:00
2

As of September 2021 - the 2021.9 release of IronPDF has been significantly better at parallelism than ever before.

True multithreading is working well for my team in both web and desktop application development.

This version is available on nuget: https://www.nuget.org/packages/IronPdf/

Tested on IronPdf Version 2021.9.3737....

C# & VB Code Examples:

  • IronPDF + Async : https://ironpdf.com/examples/async/

  • IronPDF + System.Threading : https://ironpdf.com/examples/threading/

  • IronPDF + Parallel.ForEach : https://ironpdf.com/examples/parallel/

     using System;
     using System.Collections.Generic;
     using System.Linq;
     using System.Threading.Tasks;
     using IronPdf;
    
    
     var Renderer = new IronPdf.ChromePdfRenderer();
    
     var htmls = new List<string>()  { "<h1>Html#1</h1>", "<h1>Html#2</h1>", "<h1>Html#3</h1>" };
    
     Parallel.ForEach(htmls, (html) => 
     {
             var pdfDocument = Renderer.RenderHtmlAsPdfAsync(html).Result;
             // do something with each pdfDocument  
     });
    
Stephanie
  • 600
  • 13
  • 24
0

this line

var response = await RenderPdfAsync(h);

should be smth like this

var task = RenderPdfAsync(h);

and then await all the tasks together

Z .
  • 12,657
  • 1
  • 31
  • 56