2

I have the following workflow that needs to happen in a non-blocking parallel processing manner. I would like the method DoStuff() to return immediately so I am using Task Parallel Libary

DoStuff():
  Do some setup
  Parse an Excel file

  then for each row
   Fill Template with parsed values
   Convert filled template to Pdf
   Convert pdf to Tiff

  when all row processing has completed Create Summary Text File

  when summary text file has completed, Finalize

I'm stumbling a bit with the "when all row processing has been completed" step since I want to return immediately. Is the following roughly what I should be doing?

public Task<ProcessingResult> DoStuff() {
    return new Task<SetupResult>(SetUp)
        .ContinueWith(ParseExcel, TaskContinuationOptions.OnlyOnRanToCompletion)
        .ContinueWith(excelProcessing => {
            var templateProcessing = excelProcessing.Result.RowParsing
                .Select(template =>
                  new Task<TemplateFillingResult>(()=>FillTemplate)
                       .ContinueWith(ConvertToPdf, TaskContinuationOptions.OnlyOnRanToCompletion)
                       .ContinueWith(ConvertToTiff, TaskContinuationOptions.OnlyOnRanToCompletion)
                ).ToArray()

            //-------------------------------------------------------------
            // This is the part that seems wierd
            //-------------------------------------------------------------
            Task.Factory.ContinueWhenAll(templateTasks, t=> { }).Wait();
            return new TemplatesProcessingResult(templateProcessing);
        }, TaskContinuationOptions.OnlyOnRanToCompletion)
        .ContinueWith(CreateSummaryFile, TaskContinuationOptions.OnlyOnRanToCompletion)
        .ContinueWith(FinalizeProcessing, TaskContinuationOptions.OnlyOnRanToCompletion);
George Mauer
  • 117,483
  • 131
  • 382
  • 612

1 Answers1

6

I think you're getting confused because you are trying to wire up all those components as continuations of the original event. If there is no compelling reason to make all of those calls continuations, then this can all be simply done with a single background thread (task).

var task = Task.Factory.StartNew(() =>
   {
        // setup
        // var stuff = ParseFile()

        // Executes elements in parallel and blocks this thread until all have completed, else bubbles the exception up
        var transformations = excelProcessing.Result.RowParsing.AsParallel().Select(x =>
           {
                FillTemplate(x);
           }).ToArray();

        // create summary text file

        // Finalize

        return processingResult;
   });

Basically, you can do all of that in a single thread and not have to worry about it. Marking up all those steps as continuations is pretty convoluted for what you need to do.

Then your calling code can simply block on the Result property of that guy to get the result of the asynchronous call:

  try
  {
      var result = task.Result;
  }
  catch(AggregateException e)
  {
      e.Flatten().Handle(ex => 
        {
             // Do Stuff, return true to indicate handled
        });
  }

However, the one thing you will need to be cognizant of is exceptions. If this is going to be a fire and forget task, then if you have an exception, it's going to bubble all the way up and potentially kill your process.

Tejs
  • 40,736
  • 10
  • 68
  • 86
  • The return type of the whole method in the question is `Task`, so I think the exception should go into the `ProcessingResult`, you shouldn't hide it. – svick Apr 16 '12 at 19:17
  • Definitely. I'll fix the example, as I didnt notice his task result type. – Tejs Apr 16 '12 at 19:18
  • Yeah, I've been two feet in javascript land for a while and using Deferreds so this was what seemed most natural. The reason I have them all as concrete steps is because they are but you're right that some of this stuff doesn't have to be continuations. Each run of the foreach however can safetly be in parallel so this by itself is not the solution I'm looking for. – George Mauer Apr 16 '12 at 19:18
  • if you want the foreach stuff to be parallelized, that's easy enough to do. One second, and I can update with that... – Tejs Apr 16 '12 at 19:21
  • So just to be clear, calling AsParallel() will block the thread until all the parallel stuff is finished? – George Mauer Apr 16 '12 at 19:27
  • Correct - for example `var vals = Enumerable.Range(0, 10).AsParallel().Select(x => x * x).ToList()` - the assignment to `vals` occurs after all the thread have completed the delegate assigned to the `Select` extension. – Tejs Apr 16 '12 at 19:29