1

Let's say I have an array of ten million items. I want to do some operation on each item in a foreach loop and then return that item.

foreach(var item in items)
{
  /Lets pretend this is resource-intensive
  item.someIntProp++;
}

Would breaking up the ten million item into, say, 100k item batches and then running each batch in an async operation be any faster?

The actual scenario is needing to map a bunch of objects from mongoDb bson values into .NET objects using automapper. No database calls are made during this process, .NET is just converting bsonString to string, etc.

On one hand it would seem to me that "Yes, it will be faster because multiple batches will be handled simultaneously rather than in order." On the other hand, that seems ridiculous that it wouldn't already be optimizing that.

ChrisF
  • 134,786
  • 31
  • 255
  • 325
VSO
  • 11,546
  • 25
  • 99
  • 187
  • [Which is faster?](http://ericlippert.com/2012/12/17/performance-rant/) – James Thorpe Feb 16 '16 at 16:46
  • @James Thorpe: Read - I think it's a valid question, though I realize it looks like I didn't think about it before posting. I will edit to explain more, though I don't think it's really relevant. – VSO Feb 16 '16 at 16:48
  • 3
    Why don't you try this out yourself? – trashr0x Feb 16 '16 at 16:48
  • It's a bit hard to give a better answer without much more details about the operation you want to perform. Is it CPU bound? Network bound? etc. Also on the face of it, why would breaking something up into smaller chunks be faster? Processing it in one chunk without breaking it up in theory would be fastest. – James Thorpe Feb 16 '16 at 16:49
  • @JamesThorpe Updated with what I believe to be relevant. – VSO Feb 16 '16 at 16:51
  • 2
    Do you have a processor that can handle 100 threads? If not, then having 100 groups of 100K isn't going to gain you 100x performance. This still really comes down to needing to (objectively) measure it in the environment in which it is running. There is a cost to splitting things into chunks, running them concurrrently and recombining them. It may be faster. It may not - you need to try it and measure it, in _your_ environment. – James Thorpe Feb 16 '16 at 16:55
  • 6
    .NET's Task Parallel Library (TPL) provides a method to do that easily: `Parallel.ForEach()` (https://msdn.microsoft.com/en-us/library/system.threading.tasks.parallel.foreach(v=vs.110).aspx). Just try it yourself. – Good Night Nerd Pride Feb 16 '16 at 16:56
  • "Lets pretend this is resource-intensive" - it depends on which resource(s). Parallelism _generally_ only benefits from spreading _CPU-intensive_ operations over multiple cores. If your processing is I/O or memory bound, then running batches asynchronously will likely not help. Plus the overhead of parallelism may be more than the gain. The only way to know for sure (as others have stated) is to try it and compare the results. – D Stanley Feb 16 '16 at 16:58
  • Async and threads are two different things. Async is easier to manage. The actual throughput depends on number of cores processor(s). However, how mapping millions of directly to memory would be effective. Have a look at MongoDB Driver for C# - https://mongodb.github.io/mongo-csharp-driver/ – sarat Feb 16 '16 at 16:58
  • Alright, thanks a lot everyone, gives me something to think about at least. @sarat: Yea, I am using the driver. – VSO Feb 16 '16 at 17:01
  • 1
    what do you mean by "an async operation"? – sara Feb 16 '16 at 17:01

2 Answers2

1

To quickly answer your question: Yes, but don't do it the way you posted.

Here's a good question to read over to give yourself some information on this. Optimal number of threads per core

If you have a requirement to process that many resource intensive operations I would suggest creating a new system to manage them. When you have this many asynchronous processes, you're going to have a lot of context switching, which is going to slow this down.

That being said, if this is a singular application where you just need to run it to do some conversions, don't blindly throw things to an Async Process. Use the Task Parallel Libraries. Doing so will let you manage the tasks, and let you tweak the performance based on a set of inputs (Max/min tasks|threads, current # of items in process, etc.).

This will allow you to figure out the best settings for your app, and from there you can re-use this code for other scenarios where you need batch jobs.

Community
  • 1
  • 1
Ryan Ternier
  • 8,714
  • 4
  • 46
  • 69
0

Using a Partitoner, it is easy to experiment with various chunk sizes into which your array is divided before the chunks are processed in parallel. An example is given here:

How can I maximize the performance of element-wise operation on an big array in C#

weir
  • 4,521
  • 2
  • 29
  • 42