1

I'm working on a C# app with a time-consuming sequential workflow that must be performed asynchronously. It starts when the user presses a button and the app receives a few images captured from a camera within just a few milliseconds. The work flow then.

  1. Saves the images to disk
  2. Aligns them.
  3. Generates 3d data from them.
  4. Groups them into a larger, collective object (called a "Scan").
  5. Add optional analysis data to this scan and executes it.
  6. Finally saves the scan itself is saved to an xml file alongside the images.

Some of these steps are optional and configurable.

Since the processing can take so long, there will often be a queue of "scans" awaiting processing So I need to present to a user a visual representation of the queue of captured scans, their current processing state (e.g. "Saving", "Analyzing", "Finished" etc.)

I've looked into using TPL DataFlow for this. But while the mesh is simple to create, I'm not getting just how I might monitor the status of what is going on so that I can update a user interface. Do I try to link custom action blocks that post back messages to the UI for that? Something else?

Is TPL Dataflow even the right tool for this job?

Joe
  • 5,394
  • 3
  • 23
  • 54
  • 2
    For UI representation of the progress you can use the [IProgress](https://blogs.msdn.microsoft.com/dotnet/2012/06/06/async-in-4-5-enabling-progress-and-cancellation-in-async-apis/) interface. TPL blocks can be used as IObservable so you can react on each new message coming along your pipeline. – VMAtm Feb 01 '18 at 07:11
  • I did almost go that route but Micky D's answer proved to be the approach I was looking for. Thanks anyway. – Joe Feb 02 '18 at 05:57
  • Tuples can be hard for your memory – VMAtm Feb 02 '18 at 06:10
  • I actually didn't use Tuples. I was less of a purist than Mickey. Just one "Job" object to wrap everything. – Joe Feb 02 '18 at 06:18
  • Related: [Using AsObservable to observe TPL Dataflow blocks without consuming messages](https://stackoverflow.com/questions/44579543/using-asobservable-to-observe-tpl-dataflow-blocks-without-consuming-messages) – Theodor Zoulias Jun 21 '20 at 14:28

1 Answers1

4

Reporting Overall Progress

When you consider that a TPL DataFlow graph has a beginning and end block and that you know how many items you posted into the graph, all you need do is track how many messages have reached the final block and compare it to the source count of messages that were posted into the head. This will allow you to report progress.

Now this works trivially if the blocks are 1:1 - that is, for any message in there is a single message out. If there is a one:many block, you will need to change your progress reporting accordingly.

Reporting Job Stage Progress

If you wish to present progress of a job as it travels throughout the graph, you will need to pass job details to each block, not just the data needed for the actual block. A job is a single task that must span all the steps 1-6 listed in your question.

So for example step 2 may require image data in order to perform alignment but it does not care about filenames; how many steps there are in the job or anything else job related. There is insufficient detail to know state about the current job or makes it difficult to lookup the original job based on the block input alone. You could refer to some external dictionary but graphs are best designed when they are isolated and deal only with data passed into each block.

So a simple example would be to change this minimal code from:

var alignmentBlock = new TransformBlock<Image, Image>(n => { ... });

...to:

var alignmentBlock = new TransformBlock<Job, Job>(x => 
{
     job.Stage = Stages.Aligning;

     // perform alignment here
     job.Aligned = ImageAligner.Align (x.Image, ...);

     // report progress 

     job.Stage = Stages.AlignmentComplete;
});

...and repeat the process for the other blocks.

The stage property could fire a PropertyChanged notification or use any other form of notification pattern suitable for your UI.

Notes

Now you will notice that I introduce a Job class that is passed as the only argument to each block. Job contains input data for the block as well as being a container for block output.

Now this will work, but the purist in me feels that it would be better to perhaps keep job metadata separate what is TPL block input and output otherwise there is potential state damage from multiple threads.

To get around this you may want to consider using Tuple<> and passing that into the block.

e.g.

var alignmentBlock = new TransformBlock<Tuple<Job, UnalignedImages>, 
                                        Tuple<Job, AlignedImages>>(n => { ... });
  • Unfortunately I'm not trying to just to monitor "working" and "done". I'm trying to monitor steps along the way. "Saving" "Aligning" "Generating 3d data", etc – Joe Feb 01 '18 at 04:00
  • This looks great and I really appreciate the detailed answer. I'll try to digest it today – Joe Feb 01 '18 at 14:09
  • 1
    This worked out exceedingly well. Beyond my expectations. I have to admit, I wasn't too impressed with TPL Dataflow from the examples and explanations I saw. But actually using it really turned me around. Thank you. – Joe Feb 02 '18 at 05:55
  • 1
    @Joe wonderful. Great to hear. I love DataFlow too. Wishing you well –  Feb 02 '18 at 06:40
  • @Joe oh keep an eye out for any writings including his blog and books by Stephen Cleary. He’s a great teacher –  Feb 02 '18 at 09:35
  • Thank, I already own his concurrency book. I'd been poring through it seeing all sort of interesting stuff but really needed to dive into to it to really get it – Joe Feb 02 '18 at 15:17