6

I have a constant flow of certain items that I need to process in parallel so I'm using TPL Dataflow. The catch is that the items that share the same key (similar to a Dictionary) should be processed in a FIFO order and not be parallel to each other (they can be parallel to other items with different values).

The work being done is very CPU bound with minimal asynchronous locks so my solution was to create an array of ActionBlock<T>s the size of Environment.ProcessorCount with no parallelism and post to them according to the key's GetHashCode value.

Creation:

_actionBlocks = new ActionBlock<Item>[Environment.ProcessorCount];
for (int i = 0; i < _actionBlocks.Length; i++)
{
    _actionBlocks[i] = new ActionBlock<Item>(_ => ProcessItemAsync(_));
}

Usage:

bool ProcessItem(Key key, Item item)
{
    var actionBlock = _actionBlocks[(uint)key.GetHashCode() % _actionBlocks.Length];
    return actionBlock.Post(item);
}

So, my question is, is this the best solution to my problem? Am I hurting performance/scalability? Am I missing something?

i3arnon
  • 113,022
  • 33
  • 324
  • 344
  • 1
    I like it. I can't think of another method that wouldn't require storage. I think that as long as you make sure your hash codes are properly distributed, this should be fine. – spender Jan 09 '14 at 02:23
  • Relying on the value of `GetHashCode` sounds very weird to me, why do you have it? Is the actual requirement “items that are equal should be processed in FIFO order”? – svick Jan 09 '14 at 03:02
  • @svick more like Items with the same key should be processed in FIFO order similar to how you would use a dictionary (doesn't really have to be the same Item type). I'll update the question to make that clearer. – i3arnon Jan 09 '14 at 03:05
  • @I3arnon How do you know that all threads will have at least comparable amount of work to do? There is a chance that `(uint)key.GetHashCode() % _actionBlocks.Length` will be badly distributed and some cores won't do anything. – MarcinJuraszek Jan 09 '14 at 03:15
  • @MarcinJuraszek That's true. I've made sure the hashes are as evenly divided as i can and through testing i see that is indeed the case. But... this is one of the reasons I've put it out here. – i3arnon Jan 09 '14 at 03:18

1 Answers1

3

I think your approach is reasonable, assuming you know the hash codes will be distributed well.

If you want to have a better protection against bad distributions, you could use larger number of ActionBlocks while limiting their total concurrency level by using a single custom TaskScheduler shared by all blocks. You can find such scheduler in ParallelExtensionsExtras or on MSDN.

svick
  • 236,525
  • 50
  • 385
  • 514
  • How would that solve bad distributions? If i have a "special" hash that get used more than others, how having many ActionBlocks that block each other different than using a `% _actionBlocks.Length`? The "special" hash in your case will make its queue bigger in relation to the other ones... – i3arnon Jan 09 '14 at 18:12
  • 1
    Yes, it will still be bigger than the others, but it's likely that it will be smaller than with small number of block, because there will be smaller number of collisions with that special hash. For example, if half of all hashes are 0 and the rest is distributed uniformly, then with 2 blocks, 3/4 of all items will go to block 0. But with 4 blocks, it's just 5/8 and with inifinity blocks, it would be 1/2. – svick Jan 09 '14 at 18:38
  • But you would still have only 2 threads. one would handle the 5/8 block and a 1/8 block (6/8 = 3/4) and the other thread would handle the 2 1/8 blocks left (2/8 = 1/4). Am i missing something? I get doing that when you also increase the thread count but this code is very CPU bound and i AFAIK single thread per core is recommended. – i3arnon Jan 09 '14 at 20:27
  • 1
    No, that's not what I meant, you would have 2 threads, but they would be shared across all 4 blocks. This way, the load on the threads should be spread mostly evenly, even when the load on the blocks isn't. – svick Jan 09 '14 at 21:37
  • 1
    Oversubscribing is even required to fully utilize the CPU because a random distribution is not an equi-distribution. Some blocks will be underutilized, some over-utilized. – usr Jan 09 '14 at 21:44