1

I have a challenge that I am encountering when needing to pull down data from a service. I'm using the following call to Parallel.ForEach:

Parallel.ForEach(idList, id => GetDetails(id));

GetDetails(id) calls a web service that takes roughly half a second and adds the resulting details to a list.

static void GetDetails(string id)
{
    var details = WebService.GetDetails(Key, Secret, id);
    AllDetails.Add(id, details);
}

The problem is, I know the service can handle more calls, but I can't seem to figure out how to get my process to ramp up more calls, UNLESS I split my list and open the process multiple times. In other words, if I open this app GetDetails.exe 4 times and split the number of IDs into each, I cut the run time down to 25% of the original. This tells me that the possibility is there but I am unsure how to achieve it without ramping up the console app multiple times.

Hopefully this is a pretty simple issue for folks that are more familiar with parallelism, but in my research I've yet to solve it without running multiple instances.

Zach Becknell
  • 61
  • 3
  • 8
  • You need to show us the code for `GetDetails` if you expect any usful help. – Scott Chamberlain Jun 08 '17 at 20:12
  • It is literally a query to a web service, but I will add that detail. The only reason I'm asking this question is because I know it is technically possible by spinning up multiple instances of the console, so I was trying to find out whether or not a single instance can replicate that. – Zach Becknell Jun 08 '17 at 20:18
  • You need to include more information, what is the type of `AllDetails` and `WebService`? How and where are they declared? Are the functions you are using on them thread safe? – Scott Chamberlain Jun 08 '17 at 20:25
  • Understood, and what I can tell you is that I'm calling the non-async version of the method from the service reference. Would switching to the async version help at all? – Zach Becknell Jun 08 '17 at 20:25
  • `AllDetails` is a `Dictionary` (updated the code to reflect that) and `WebService` is a wrapper around a service reference. The wrapper basically does some conversion of the returned data before returning it. We tried executing the service call directly without any of these conversions and the same performance was the result. – Zach Becknell Jun 08 '17 at 20:37
  • 2
    So, neither of those two classes a are safe to use from multiple threads at the same time, so you can't use them in a `Parallel.ForEach`. – Scott Chamberlain Jun 08 '17 at 20:38
  • So it's essentially dangerous to run the process the way I've been running it? – Zach Becknell Jun 08 '17 at 20:41

1 Answers1

1

A few possibilities:

  • There's a chance that WebService.GetDetails(...) is using some kind of mechanism to ensure that only one web request actually happens at a time.
  • .NET itself may be limiting the number of connections, either to a given host or in general see this question's answers for details about these kinds of problems
  • If WebService.GetDetails(...) reuses some kind of identifier like a session key, the server may be limiting the number of concurrent requests that it accepts for that one session.
  • It's generally a bad idea to try to solve performance issues by hammering the server with more concurrent requests. If you control the server, then you're causing your own server to do way more work than it needs to. If not, you run the risk of getting IP-banned or something for abusing their service. It's worth checking to see if the service you're accessing has some options to batch your requests or something.
  • As Scott Chamberlain mentioned in comments, you need to be careful with parallel processes because accessing structures like Dictionary<> from multiple threads concurrently can cause sporadic, hard-to-track-down bugs. You'd probably be better off using async requests rather than parallel threads. If you're careful about your awaits, you can have multiple requests be active concurrently while still using just a single thread at a time.
StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315
  • 1
    The second bullet point was my issue. The answer was ultimately to up the number of connections to the host in the `` element in ``. I'm not sure what the default is, but upping `maxconnection` cut the run time by 80%. Also, I will say that this is not something that will be run more than once -- we just don't have a batch means of doing this and will need to do a one-time load using this less-than-optimal solution (now much improved by your answer). – Zach Becknell Jun 08 '17 at 21:04
  • 1
    Also regarding thread safety, I've switched the `Dictionary<>` out for `ConcurrentDictionary<>`. Thank you for your help. – Zach Becknell Jun 09 '17 at 12:09