Avoiding use of ActionBlock.Post when PostDataflowBlockOptions.BoundedCapacity is not the default value?

Question

I've heard that you can lose information if you use the Post method instead of the SendAsync method of an ActionBlock<T> object, when you decide to utilize it's BoundedCapacity property.

Could someone please explain why that is so?

Theodor Zoulias · Accepted Answer · 2020-05-26T12:04:59.990

4

The Post method attempts to post an item synchronously and returns true or false, depending on whether the block accepted the item or not. Reasons to not accept an item:

The block is marked as completed (by calling its Complete method).
The block is completed, either successfully or unsuccessfully (its Completion.IsCompleted property returns true).
The block has a bounded capacity (option BoundedCapacity != -1), and its buffer is currently full.

The SendAsync method attempts to post an item asynchronously and returns a Task<bool>. This task will always be completed, unless the block has a bounded capacity, its buffer is currently full, and it's not currently completed or marked as completed. This is the only case that the SendAsync will behave asynchronously. After awaiting the task, the bool result of the task indicates whether the block accepted the item or not. Reasons to not accept an item:

The block was marked as completed either before calling the SendAsync, or during the awaiting.
The block was completed either before calling the SendAsync, or during the awaiting as a result of an exception, or because its Fault method was invoked.

So the difference between Post and SendAsync is the point (3). They behave differently in the case of a bounded-capacity block with a full buffer. In this case the Post rejects immediately the item, while the SendAsync will asynchronously accept it when the buffer has free space again.

In most cases the behavior of SendAsync is the desirable one. Using the Post instead of the SendAsync can be seen as a bug that is waiting to happen, when some time later the block is reconfigured as bounded, to solve newly discovered problems related with excessive memory usage.

It is a good idea to not dismiss the return value of both methods, because a return value of false indicates in most cases a bug. it is quite rare to expect and be ready to handle a false result. Some ideas:

if (!block.Post(item)) throw new InvalidOperationException();

if (!await block.SendAsync(item)) throw new InvalidOperationException();

var accepted = block.Post(item); Debug.Assert(accepted);

var accepted = await block.SendAsync(item); Debug.Assert(accepted);

edited May 26 '20 at 12:04

answered May 26 '20 at 11:30

Theodor Zoulias

34,835
7
69
104

If you're fully in control and know with 100% certainty that the datablock has not been completed, isn't it completely safe to disregard the actual return values of the `SendAsync` method? For `Post` it's important, because if you indeed had a bounded capacity set, your data element wouldn't have been queued for work (if it was processing the maximum amount of elements as specified in the `BoundedCapacity` property) – SpiritBob May 26 '20 at 12:10
@SpiritBob yes, if you are writing a throw-away program that you don't expect to touch again, by all means you can spare yourself the considerations about the return value of these methods. But if you are writing a program that is expected to evolve, saving yourself some seconds today (by omitting the `Debug.Assert`) may cost you some hours of frustration in the future. Or even worse, it may cost you not only frustration but also embarrassment. No one appreciates a program that produces incorrect results, and it is difficult to evaluate the cost of losing the trust of your customers. – Theodor Zoulias May 26 '20 at 12:29
@SpiritBob sorry if my previous comment was a bit harsh. It reveals how much terrified I am by the bug that may occur. It is unpredictable, silent, and can cause permanent damage to the data. And all it takes is to add a benign-looking configuration at a later stage of a project. – Theodor Zoulias May 26 '20 at 22:36
The problem in my specific situation is that I can't allow the luxury of waiting for `SendAsync` to complete. The best I could do is check if the task's `IsFaulted` property is true, immediately after firing that method? – SpiritBob May 27 '20 at 07:01
@SpiritBob the task returned by `SendAsync` never fails. So its `IsFaulted` property will never become `true`. If you can't wait asynchronously for the `Task` to complete, then probably the `Post` method is more suitable in your case. Just make sure to check its return value, and act accordingly if it returns `false`. If this is still not an option, then the last alternative is to not use the `BoundedCapacity` setting. – Theodor Zoulias May 27 '20 at 09:16
The only solution I can potentially see to addressing the usage of `SendAsync` whilst checking it for faults, would be to use `ContinueWith` and point it to a method where I can potentially log/report the issue at hand. Do you agree with this approach? – SpiritBob May 29 '20 at 07:24
@SpiritBob it depends on what you would like to happen in case that the buffer is full. Are you OK with dropping the message, and just logging somewhere that the message was dropped? If yes, then using the `Post` is sufficient. Just check its return value, and if it's `false`, log it. – Theodor Zoulias May 29 '20 at 08:15
@SpiritBob having incomplete tasks hanging around will not help at alleviating the memory pressure. Most probably it's going to increase it, because then not only the superfluous messages will be still residing in memory, but also the associated `SendAsync` tasks and their continuations. – Theodor Zoulias May 29 '20 at 08:29
That's the problem - I want to schedule them for completion when the bounded capacity is full. But if we want to make the system bullet-proof from future changes as you say to the project, by adding verbosity/error logging, I don't see a way except by using `ContinueWith` **if** we accept that I can't await the actual task at hand. Perhaps I can spun up a dedicated OS thread, specifically to deal with these tasks by storing them in a queue of sorts and awaiting each's completion (FIFO). Though I think the dedicated OS thread is a bit of an overkill to this issue. – SpiritBob May 29 '20 at 08:46
@SpiritBob my opinion is, if you don't know how to handle the case of excessive memory usage, just leave it unhandled. Don't set the `BoundedCapacity` option, and in the (hopefully) rare and unfortunate case that billions of unprocessed messages have found their way to the buffer of a block, let the application die from the inevitable `OutOfMemoryException`. It is just not possible to store unlimited items to limited space. Something will have to give. – Theodor Zoulias May 29 '20 at 09:08
1

@TheodorZoulias I have to say your explanation is the best I've seen even more so than the actual documentation. Thanks for clarifying it for me. – Ken Hadden May 12 '22 at 22:04

score 2 · Answer 2 · answered May 26 '20 at 08:23

Yes you can lose information, Post has higher potential to do that but SendAsync also can lose information. Let's say you have an ActionBlock that is taking 1000 ms to complete, during this time period 10 messages are posted. BoundedCapacity is set to 5 for the ActionBlock. As a result, the last 5 message is not processed, information lost.

Here are some detail about it: TPL Dataflow, whats the functional difference between Post() and SendAsync()?

See second answer.

Avoiding use of ActionBlock.Post when PostDataflowBlockOptions.BoundedCapacity is not the default value?

2 Answers2

Linked