TPL Dataflow Block with permanent Task/Thread

Question

Stepen Toub mentions in this Channel 9 Video that a *Block creates a task if an item was pushed to its incoming queue. If all items in queue are computed the task gets destroyed.

If I use a lot of blocks to build up a mesh to number of actually running tasks is not clear (and if the TaskScheduler is the default one the number of active ThreadPool threads is also not clear).

Does TPL Dataflow offers a way where I can say: "Ok I want this kind of block with a permanent running task (thread)?

score 3 · Accepted Answer · answered Jun 10 '17 at 03:22

TL;DR: there is no way to dedicate a thread to a block, as it's clearly conflicting with purpose of TPL Dataflow, except by implementing your own TaskScheduler. Do measure before trying to improve your application performance.

I just watched the video and can't find such phrase in there:

creates a task if an item was pushed to its incoming queue. If all items in queue are computed the task gets destroyed.

Maybe I'm missing something, but all that Stephen said is: [at the beginning] We have a common Producer-Consumer problem, which can be easily implemented with .Net 4.0 stack, but the problem is that if the data runs out, the consumer goes away from loop, and never return.

[After that] Stephen explains, how such problem can be solved with TPL Dataflow, and he said that the ActionBlock starts a Task if it wasn't started. Inside that task there is code which waits (in async fashion) for a new message, freeing up the thread, but not destroying the task.

Also Stephen mentioned task while explaining the sending messages across the linked blocks, and there he says that posting task will fade away if there is no data to send. It doesn't mean that a task corresponding to the block fades away, it's only about some child task being used to send data, and that's it.

In the TPL Dataflow the only way to say to the block that there wouldn't be any more data: by calling it's Complete method or completing any of linked blocks. After that consuming task will be stopped, and, after all buffered data being processed, the block will end it's task.

According the official github for TPL Dataflow, all tasks for message handling inside blocks are created as DenyChildAttach, and, sometimes, with PreferFairness flag. So, there is no reason for me to provide a mechanism to fit one thread directly to the block, as it will stuck and waste CPU resources if there is no data for the block. You may introduce some custom TaskScheduler for blocks, but right now it's not obvious why do you need that.

If you're worried that some block may get more CPU time than others, there is a way to leverage that effect. According official docs, you can try to set the MaxMessagesPerTask property, forcing the task restart after some amount of data being sent. Still, this should be done only after measuring actual execution time.

Now, back to your words:

number of actually running tasks is not clear
the number of active ThreadPool threads is also not clear

How did you profile your application? During debug you can easily find all active tasks and all active threads. If it's not enough, you can profile your application, either with native Microsoft tools or a specialized profiler, like dotTrace, for example. Such toolkit can easily provide you information about what's going on in your app.

Thanks for the super detailed answer. Now things are more clear to me. — Moerwald, Jun 10 '17 at 07:05

Theodor Zoulias · Answer 2 · 2022-09-27T12:55:36.140

The talk is about the internal machinery of the TPL Dataflow library. As a mechanism it is a quite efficient, and you shouldn't really worry about any overhead unless your intended throughput is in the order of 100,000 messages per second or more (in which case you should search for ways to chunkify your workload). Even with workloads with very small granularity, the difference between processing the messages using a single task for all messages, or a dedicated task for each one, should be hardly noticeable. A Task is an object that "weighs" normally a couple hundred bytes, and the .NET platform is able to create and recycle millions of objects of this size per second.

It would be a problem if each Task required its own dedicated 1MB thread in order to run, but this is not the case. Typically the tasks are executed using ThreadPool threads, and a single ThreadPool thread can potentially execute millions of short-lived tasks per second.

I should also mention that the TPL Dataflow supports asynchronous lambdas too (lambdas with Task return types), in which case the blocks essentially don't have to execute any code at all. They just await the generated promise-style tasks to complete, and for asynchronous waiting no thread is needed.

TPL Dataflow Block with permanent Task/Thread

2 Answers2