-1

I have a foreach() that loops through 15 reports and generates a PDF for each. The PDF generation process is slow (3 seconds each). But if I could generate them all concurrently with threads, maybe all 15 could be done in 4-5 seconds total. One constraint is that the function must not return until ALL pdfs have generated. Also, will 15 concurrent worker threads cause problems or instability for dotnet/windows?

Here is my pseudocode:

private void makePDFs(string path) {
  string[] folders = Directory.GetDirectories(path);

  foreach(string folderPath in folders) {
     generatePDF(...);   
   }

   // DO NOT RETURN UNTIL ALL PDFs HAVE BEEN GENERATED
 }
}

What is the simplest way to achieve this?

Andrei
  • 42,814
  • 35
  • 154
  • 218
HerrimanCoder
  • 6,835
  • 24
  • 78
  • 158

3 Answers3

4

The most straightforward approach is to use Parallel.ForEach:

private void makePDFs(string path)
{
    string[] folders = Directory.GetDirectories(path);

    Parallel.ForEach(folders, (folderPath)  => 
      {
          generatePDF(folderPath);   
      };

    //WILL NOT RETURN UNTIL ALL PDFs HAVE BEEN GENERATED
}

This way you avoid having to create, keep track of, and await each separate task; the TPL does it all for you.

John Wu
  • 50,556
  • 8
  • 44
  • 80
  • John, this looks promising, but inside my generatePDF() function is a call to HttpContext.Current, which is null when inside the context of Parallel.ForEach(). When I run it the old (slow) way, I'm able to reference this object without errors. Any ideas? – HerrimanCoder Sep 15 '17 at 17:28
  • You've run into [this issue](https://stackoverflow.com/questions/26710980/parallel-foreach-error-httpcontext-current). What data elements do you need from `HttpContext`? – John Wu Sep 15 '17 at 17:31
  • When I pass in HtttpContext things go wrong, even though it's not null, the same context keeps getting reused, and the URL that the PDF generator points to seems to be the same each time. I'm going to accept your answer and open a new question, because it's tangential now at this point. At the core, Parallel.ForEach() does its job nicely! – HerrimanCoder Sep 15 '17 at 21:51
  • I wouldn't try to clone HttpContext. Instead, extract the values you need as primitive types, e.g. if you need to know a cookie value, extract it in the main thread and pass it as a string. HttpContext is bound to the thread (sort of, it's complicated). – John Wu Sep 15 '17 at 21:59
  • It's not a matter of getting values, but more a matter of needing to run `Server.Execute()` inside the worker threads. I just wrote up the whole issue here, would love if you could take a peek: https://stackoverflow.com/questions/46248148/how-to-safely-use-server-execute-inside-a-worker-thread – HerrimanCoder Sep 15 '17 at 22:21
2

You need to get a list of tasks and then use Task.WhenAll to wait for completion

var tasks = folders.Select(folder => Task.Run(() => generatePDF(...)));
await Task.WhenAll(tasks);

If you can't or don't want to use async/await you can use:

Task.WaitAll(tasks);

It will block current thread until all tasks are completed. So I'd recommend to use the 1st approach if you can.


You can also run your PDF generation in parallel using Parallel C# class:

Parallel.ForEach(folders, folder => generatePDF(...));

Please see this answer to choose which approach works the best for your problem.

Andrei
  • 42,814
  • 35
  • 154
  • 218
0

.NET has a handy method just for this: Task.WhenAll(IEnumerable<Task>) It will wait for all tasks in the IEnumerable to finish before continuing. It is an async method, so you need to await it.

var tasks = new List<Task>();
foreach(string folderPath in folders) {
    tasks.Add(Task.Run(() => generatePdf()));
}
await Task.WhenAll(tasks);
Strikegently
  • 2,251
  • 20
  • 23
  • Kyle, Task.Run() requires 1 or more params. There are 8 overloads for this and none are void. The first option is a param of type `Action`. What exactly should I be passing in there? Can you show sample code? – HerrimanCoder Sep 15 '17 at 17:06
  • You can pass it a function. From @Andrei : `Task.Run(() => generatePdf())`. I updated my answer to clarify. – Strikegently Sep 15 '17 at 17:16