1

I am not pro in utilizing resources to the best hence am seeking the best way for a task that needs to be done in parallel and efficiently.

We have a scenario wherein we have to ping millions of system and receive a response. The response itself takes no time in computation but the task is network based.

My current implementation looks like this -

Parallel.ForEach(list, ip =>
{
    try
    {
        // var record = client.QueryAsync(ip);
        var record = client.Query(ip);
        results.Add(record);
    }
    catch (Exception)
    {
        failed.Add(ip);
    }
});

I tested this code for

  • 100 items it takes about 4 secs
  • 1k items it takes about 10 secs
  • 10k items it takes about 80 secs
  • 100k items it takes about 710 secs

I need to process close to 20M queries, what strategy should i use in order to speed this up further

Muds
  • 4,006
  • 5
  • 31
  • 53
  • Warning: `results` (and `failed`) if it's `List` is **not** thread safe. – Dmitry Bychenko Apr 05 '18 at 10:19
  • yea right, i was using concurrentBag but it was just a desperate measure to speed up, i will revert back to a threadsafe collection – Muds Apr 05 '18 at 10:20
  • 1
    If that `client.Query` has async version, then best way would be to use it, since network call is IO task. – Evk Apr 05 '18 at 10:20
  • To start with not Parallel.Foreach, this is not suited to the task you describe – TheGeneral Apr 05 '18 at 10:20
  • What's `client.Query`? – Camilo Terevinto Apr 05 '18 at 10:20
  • Sounds like a job for DataFlow – TheGeneral Apr 05 '18 at 10:21
  • @Evk there is a async method for query available but am not sure how to use it in this case to get things faster – Muds Apr 05 '18 at 10:21
  • Client.query queries system if it is alive of not, returns a response either case – Muds Apr 05 '18 at 10:22
  • @TheGeneral can you please link me to a resource explaining what is dataflow ? – Muds Apr 05 '18 at 10:23
  • What I meant was: what's the type and its implementation (if not third-party) – Camilo Terevinto Apr 05 '18 at 10:26
  • it is third party, but i believe the solution wont depend on how ClientClass behaves, would it ? – Muds Apr 05 '18 at 10:27
  • 3
    Here is question which lists multiple options (data flow, custom partitioner, semaphore slim): https://stackoverflow.com/q/14673728/5311735. You need to use `QueryAsync` and some big degree of parallelism which you should find in empirical way, such as 100, or maybe even 1000. Unlimited degree might or might not saturate your sockets, depending on how fast `QueryAsync` completes and some other things, so worth trying that too. Note that it will work only if `QueryAsync` uses real async IO and not fakes it (via something like `Task.Run`). – Evk Apr 05 '18 at 10:34
  • 1
    And if so happens that `QueryAsync` is not real async - do `ThreadPool.SetMinThreads(100, 8)` (where 100 is parallelism degree you need) and try any of the solutions again. – Evk Apr 05 '18 at 14:26

1 Answers1

4

Here is the problem

Parallel.ForEach uses the thread pool. Moreover, IO bound operations will block those threads waiting for a device to respond and tie up resources.

  • If you have CPU bound code, Parallelism is appropriate;
  • Though if you have IO bound code, Asynchrony is appropriate.

In this case, client.Query is clearly I/O, so the ideal consuming code would be asynchronous.

Since you said there was an async verison, you are best to use async/await pattern and/or some type of limit on concurrent tasks, another neat solution is to use ActionBlock Class in the TPL dataflow library.


Dataflow example

public static async Task DoWorkLoads(List<IPAddress> addresses)
{
   var options = new ExecutionDataflowBlockOptions
                     {
                        MaxDegreeOfParallelism = 50
                     };

   var block = new ActionBlock<IPAddress>(MyMethodAsync, options);

   foreach (var ip in addresses)
      block.Post(ip);

   block.Complete();
   await block.Completion;

}

...

public async Task MyMethodAsync(IpAddress ip)
{

    try
    {
        var record = await client.Query(ip);
        // note this is not thread safe best to lock it
        results.Add(record);
    }
    catch (Exception)
    {
        // note this is not thread safe best to lock it
        failed.Add(ip);
    }
}

This approach gives you Asynchrony, it also gives you MaxDegreeOfParallelism, it doesn't waste resources, and lets IO be IO without chewing up unnecessary resources

*Disclaimer, DataFlow may not be where you want to be, however i just thought id give you some more information


Demo here

update

I just did some bench-marking with Parallel.Foreaceh and DataFlow

Run multiple times 10000 pings

Parallel.Foreach = 30 seconds

DataFlow = 10 seconds

TheGeneral
  • 79,002
  • 9
  • 103
  • 141
  • Thanks, will try this and come back with figures, thanks for pointing me to new things! (new to me) – Muds Apr 05 '18 at 10:36
  • i am not able to get the flow right, code doesnt awaits at block.Completion – Muds Apr 05 '18 at 10:59
  • @Muds 2 secs i havent tested this, ill just check it now – TheGeneral Apr 05 '18 at 11:01
  • also, will using parallel.foreach to post improve performance ? – Muds Apr 05 '18 at 11:15
  • @Muds Yeah any asynchronous parallel solution will improve performance . 2 secs im just getting a demo ready for you, it should be waiting – TheGeneral Apr 05 '18 at 11:16
  • .wait() does wait for task to complete but am afraid times have increased, i will try to drill down and come back – Muds Apr 05 '18 at 13:04
  • @mud i just did a test with a 10,000 pings both with parallel foreach and dataflow async and dataflow was 3 times faster. constantly run data flow is about 10 seconds, and foreach is 30 seconds – TheGeneral Apr 05 '18 at 13:22
  • let me get some figures and will post my demo – Muds Apr 05 '18 at 13:27
  • @Muds its probably the case your async version of the method isnt truely async and just calls task.run – TheGeneral Apr 05 '18 at 13:28
  • Thanks for your help bro, i figure out results are consistent if i just use wait, seems there are issues with third party. also, now i have fair idea about how to deal with io intensive processes vs cpu intensive processes thanks again – Muds Apr 05 '18 at 14:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/168321/discussion-between-muds-and-thegeneral). – Muds Apr 05 '18 at 14:21
  • i have observed that just using async await and not using action block speeds up the process – Muds Apr 06 '18 at 13:48
  • 1
    @Muds yes it would, there is less overhead, you need to work this out for your own situation – TheGeneral Apr 06 '18 at 14:46