-1

I have an IEnumerable with a lot of items that needs to be processed in parallel. The items are not CPU intensive. Ideally these items should be executed simultaneouslyon 100 threads or more.

I've tried to do this with Parallel.ForEach(). That works, but the problem is that new threads are spawn too slowly. It takes a (too) long time before the Parallel.Foreach() reaches the 100 threads. I know there is a MaxDegreeOfParallelism property, but that's a maximum, not mimimum.

Is there a way to execute the foreach immediatelly on 100 threads? ThreadPool.SetMinThreads is something that we prefer to avoid, because it has an impact on the whole process.

Is there a solution possible with a custom partitioner?

TWT
  • 2,511
  • 1
  • 23
  • 37
  • 1
    Well `Parallel.ForEach` uses thread pool and so plays but it's rules, so `SetMinThreads` seems the only option (if you want to use `Parallel.ForEach` specifically). – Evk Mar 02 '17 at 21:11
  • Can you provide a sample demonstrating what your _Processing_ intails. Is it IO bound/CPU bound, is it multi step processes.. etc.? – JSteward Mar 02 '17 at 21:13
  • But, if your tasks are not CPU intensive (and so - mostly IO intensive) - you can use async\await combined with SemaphoreSlim to limit concurrency, for example like here http://stackoverflow.com/a/10810730/5311735 (but don't use Task.Run(await ...) like there). – Evk Mar 02 '17 at 21:14
  • 7
    If your operations aren't CPU intensive *then you shouldn't be creating additional threads in the first place*. Creating a bunch of threads only to have them sit around doing nothing (because you apparently don't have work for them to do) is going to make your code *slower*, and consume more resources, not faster. – Servy Mar 02 '17 at 21:15
  • 3
    It will take longer to run 100 threads simultaneously on, say, 4 cores, than using 4 threads at a time because of the cost involved in switching between threads. *However* I suggest trying it for yourself, maybe you do have 100+ cores. – Andrew Morton Mar 02 '17 at 21:17
  • You should read about thread starvation firstly, before starting all the 100 threads – VMAtm Mar 02 '17 at 22:21
  • Whatever problem you're dealing with, creating and executing 100 concurrent threads is _not_ the answer. Unless by some miracle you've got a computer that actually has 100 cores and the threads will actually be using the CPU. Otherwise, there are better ways to solve the problem. And that's guaranteed even without you explaining what the actual problem is. Your question seems to be an [XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) question. Please explain what it is you're actually trying to do. – Peter Duniho Mar 02 '17 at 23:15
  • `Parallel.Foreach` is meant for data parallelism. If the job isn't CPU intensive, you don't need data parallelism. You should be looking at `async/await` and the TPL Dataflow – Panagiotis Kanavos Mar 03 '17 at 11:15
  • I'm pinging a lot of devices with a timeout of 5 seconds. How would you do that as quick as possible with only 4 threads (4cores)? – TWT Mar 03 '17 at 14:18
  • @TWT I'm assuming your pinging across some sort of LAN. In that case each thread is more than capable of sending many pings in the time required for a single ping to return. A thread can accomplish a great deal of work. What your looking for is an asynchronous ping. How are you pinging the devices? – JSteward Mar 03 '17 at 21:58

2 Answers2

0

I'm pinging a lot of devices with a timeout of 5 seconds. How would you do that as quick as possible with only 4 threads (4cores)?

I'm going to assume your pinging devices on a LAN and each one is identifiable and reachable by an IP address.

namespace PingManyDevices {

    public class DeviceChecker {                

        public async Task<PingReply[]> CheckAllDevices(IEnumerable<IPAddress> devices) {
            var pings = devices.Select(address => new Ping().SendPingAsync(address, 5000));
            return await Task.WhenAll(pings);
        }
        /***
        * Maybe push it a little further
        ***/ 
        public async Task<PingReply[]> CheckAllDevices(IEnumerable<IPAddress> devices) {
            var pings = devices.AsParallel().Select(address => new Ping().SendPingAsync(address, 5000));
            return await Task.WhenAll(pings);
        }          
    }
} 
JSteward
  • 6,833
  • 2
  • 21
  • 30
-2

I've had success using ThreadPool instead of Parallel:

public static void ThreadForEach<T>(this IEnumerable<T> items, Action<T> action)
{
    var mres = new List<ManualResetEvent>();

    foreach (var item in items)
    {
        var mre = new ManualResetEvent(false);

        ThreadPool.QueueUserWorkItem((i) =>
        {
            action((T)i);
            mre.Set();
        }, item);

        mres.Add(mre);
    }

    mres.ForEach(mre => mre.WaitOne());
}

In cases where I've had to use this, it ran faster than attempts using Parallel.ForEach. I can only speculate that it is because it attempts to use already existing threads (instead of taking the overhead to create new ones).

Eric
  • 1,737
  • 1
  • 13
  • 17
  • You could avoid the discussion of threads entirely, and foreach entirely, using a simple PLINQ query. But if the OP's processing is in any way IO bound then that needs to move towards an sync solution. – JSteward Mar 02 '17 at 21:31
  • Parallel uses the Threadpool as well. Parallel isn't broken. Anyway, your code has a *lot* of issues. First, it uses a `List` instead of a concurrent collection. Second, it's entirely redundant. `await Task.WaitAll(items.Select(it=>Task.Run(action(it)).ToArray())` would do the exact same thing – Panagiotis Kanavos Mar 03 '17 at 11:12
  • Finally, `Parallel` is meant for *data* parallelism. That means you want to process a lot of data, not a lot of tasks. Typically you need no more tasks than there are cores to process a partition of the data each. If the action blocks, it's a misuse of `Parallel` – Panagiotis Kanavos Mar 03 '17 at 11:14