20

An image speaks more than words, so here is basically what I want to achieve :
(I have also used a fruit analogy for the sake of genericity an simplicity)
enter image description here

I've done this kind of stuff many time in the past using different king of .Net classes (BackGroundWOrkers, ThreadPool, Self Made Stuff...)

I am asking here for the sake of advice and to get fresh ideas on how to do this efficiently.
This is a high computing program so I am receiving Millions of (similar in structure but not in content) data, that have to be queued in order to be processed according to its content type. Hence, I want to avoid creating a parallel task for each single data to be processed (this overloads the CPU and is poor design IMHO). That's why I got the idea of having only ONE thread running for EACH data TYPE, dedicated to processing it (knowing that the "Press Juice" method is generic and independent of the fruit to be pressed)

Any Ideas and implementation suggestions are welcome.
I am free to give any further details.

Juergen
  • 12,378
  • 7
  • 39
  • 55
Mehdi LAMRANI
  • 11,289
  • 14
  • 88
  • 130
  • 17
    Your apples and bananas look suspiciously like oranges ;-) – spender Jan 08 '13 at 10:36
  • Instead of using one thread per type, I'd suggest using a threadpool. Also possible is PLinq (default uses all available processors/threads), where you can limit the amount of threads trivially (source.AsParallel().WithDegreeOfParallelism(2) ). See more [here](http://msdn.microsoft.com/en-us/library/dd997425.aspx) – Destrictor Jan 08 '13 at 10:39
  • 1
    One task/thread for each type, and use a blocking queue http://stackoverflow.com/questions/530211/creating-a-blocking-queuet-in-net per task/thread. – Maarten Jan 08 '13 at 10:43
  • 2
    I don't see the advantage of a separate thread for each type. Why not one thread/task that switches on fruitClass and can process any old citrus? That, or FruitClass has an abstract 'Press()' method that is implemented in apples/bananas/oranges so that the threads just need to call 'dequeuedFruit->Press()'. – Martin James Jan 08 '13 at 11:22
  • I would use a Producer/Consumer design pattern, and assign a static number of Consumer threads to each queue. How many Consumer threads you use per queue is implementation specific - maybe you can post a more specific follow up question with more details of the implementation that you have chosen. – mbeckish Jan 08 '13 at 14:03
  • Based on your comment in @tallseth 's answer, it sounds like you are really overcomplicating this. Since each item of a given type needs to be processed sequentially, and assuming you have a static number of types, then just start up one thread per type, and in each thread process the items sequentially. – mbeckish Jan 08 '13 at 14:45

3 Answers3

19

TPL DataFlow seems like a very strong candidate for this.

Take a read of the intro here.

svick
  • 236,525
  • 50
  • 385
  • 514
spender
  • 117,338
  • 33
  • 229
  • 351
  • Hmmm. This looks indeed promising w/ nice capabilities. I am presently having a look at it (though it seems there is an entry cost to familiarize with it) – Mehdi LAMRANI Jan 08 '13 at 10:51
  • 2
    Yes, there is a cost... IMO well worth the effort. It's radically changed the way I think about concurrency and means that I rarely get into the thread-hell that other approaches might produce. I used to use the awesome Microsoft CCR but upgraded to DF because it seems like a lot of the goodness of CCR forms the basis of DataFlow. It allows the same paradigms with code that is considerably easier on the eye. – spender Jan 08 '13 at 11:00
  • 1
    Ok that was pretty awesome I have to say. Not that costly in terms of learning curve after all, and pretty straightforward. Why did this take so long to get something like this ? :) – Mehdi LAMRANI Jan 14 '13 at 13:13
  • @MikaJacobi Sweet! CCR has been around for a while (hiding out in MS Robotics Studio... IMO very ill-publicised relative to the benefit it bought me) and dataflow allowed me to update my CCR code to something more current (but adds more to the mix when coupled with async/await) – spender Jan 14 '13 at 18:55
  • Thank you! TFL works perfect for my task! (Parallel reading & processing of big number independant text articles) – Mike Keskinov Jun 30 '15 at 19:45
10

If all you really want is one thread (or a constant number of threads) for each type of fruit, then the simplest solution might be to use a BlockingCollection for each type of fruit. Your data bus will deliver the fruit to those collections, and your processing threads will take from them. But this means if there are no apples for now, the thread will be blocked, doing nothing.

A more flexible and efficient approach would be to use TPL Dataflow. With that, you don't work with threads or tasks, you work with blocks. For example your Thread C could be represented as a TransformBlock<Apple, AppleJuice>.

By default, each block uses at most one thread, but they can be easily configured to use more of them (by setting MaxDegreeOfParallelism). Also, dataflow blocks work well with the new C# 5.0 async-await, which could be a big advantage.

There are also things you should be careful about. For example, TDF is by default optimized for throughput, not latency. So, if your thread pool is busy and you have lots of oranges incoming and only one apple, it's possible that the apple will be processed only after all of the oranges are. But this can be also fixed by configuring the blocks properly (by setting MaxMessagesPerTask).

svick
  • 236,525
  • 50
  • 385
  • 514
1

I would caution against a "worker thread per data type" approach. This makes the assumption that actual input load will conform to the equivalence classes that are handy for developers. Do you know if bananas are 5x slower to juice than oranges? What happens if every Tuesday is "apple celebration day" and everyone juices more fruit than usual, and all of it is apples?

Running things in parallel is about performance, not about the domain. Don't model it after your domain, model it to provide the lowest average cycle time.

tallseth
  • 3,635
  • 1
  • 23
  • 24
  • It seems like you're assuming there is only one thread per queue. For example, on "apple celebration day", more consumer threads could be assigned to the apple queue. – mbeckish Jan 08 '13 at 14:25
  • That's because I read this part of the question: "That's why I got the idea of having only ONE thread running for EACH data TYPE" – tallseth Jan 08 '13 at 14:27
  • I agree - one thread per data type doesn't seem like a very flexible solution. However, one queue per type can work, with multiple threads per queue. – mbeckish Jan 08 '13 at 14:28
  • Well spotted. Good Point indeed. Actually, I oversimplified my question. Actually it is not just about pressing fruit juice :) In reality data need to be sequentially processed, hence the single thread. – Mehdi LAMRANI Jan 08 '13 at 14:39