4

I have a program that needs to get some data from an Atom feed. I tried two approaches and neither of them worked well.

I've used WebClient to synchronously download all the posts I need, but as there are a few thousand and the service is slow it takes many hours.

I've tried (for the first time) async/await, the new HttpClient and Task.WhenAll. Unfortunately that results in thousands of requests hitting the service and bringing it down.

How can I run say 100 requests in parallel?

NotMe
  • 87,343
  • 27
  • 171
  • 245
TEst16
  • 407
  • 1
  • 4
  • 6
  • Have you tried HttpWebRequest using the AsyncCallback with BeginGetResponse? – ron tornambe Mar 18 '13 at 17:32
  • 1
    How is that going to help me? – TEst16 Mar 18 '13 at 17:34
  • From your question, I thought you want to make asynchronous calls? Or do you think this technique, (recommended by Google) will also overwhelm the service? – ron tornambe Mar 18 '13 at 17:40
  • possible duplicate of [System.Threading.Tasks - Limit the number of concurrent Tasks](http://stackoverflow.com/questions/2898609/system-threading-tasks-limit-the-number-of-concurrent-tasks) – mbeckish Mar 18 '13 at 17:44
  • Is this a third-party feed or yours? If third-party, you might look into their services since there is a probability that they have a service that you can request a specific count to return and maybe an offset to create a "paging"-type feature. – Justin Mar 18 '13 at 18:08
  • I've looked into the SemaphoreSlim option, but it's more general and complex than what I need. I can partition my list of Urls to download into groups, I just don't know how to express that in C#. – TEst16 Mar 18 '13 at 18:24

2 Answers2

2

You could use Parellel with ParallelOptions.MaxDegreeOfParallelism

ParallelOptions.MaxDegreeOfParallelism Property

Or a BlockingCollection with a bounded collection size

BlockingCollection Overview

I would recommend the BlockingCollection

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • 1
    Thanks Blam. ParallelOptions.MaxDegreeOfParallelism works fine. It seems to be 4 times slower in my case than async/await and HttpClient.GetAsync. I guess switching between the threads is expensive compared to the single IO completion port thread. – TEst16 Mar 18 '13 at 19:21
  • Give BlockingCollection a try. But it may not be any better. Not really sure how it handles thread. For me I like the syntax. – paparazzo Mar 18 '13 at 19:30
1

Sounds like you have a solution already in that you can get a lot done at once. I'd suggest just adding another layer on top of that which just loops through all of the posts, but only processes them 100 at a time.

Right now you might have: DownloadAll(List ListofPosts) Inside of DownloadAll you probably have a wait all at the end.

instead: For loop from 1 to ( ListofPosts Count / 100) DownloadAll(ListofPosts.Skip(xxx).Take(100));

Obviously not real code there, but then you can do chunks of 100 with little change to your main function.

david
  • 11
  • 3
  • May not be so simple if it is a third-party feed. If they only allow a "DownloadAll", not likely but you never know, you can't really return a subset properly. – Justin Mar 18 '13 at 18:11
  • Doesn't sound very efficient. I'd like to keep downloading a 100 posts at any time and not download them in batches. – TEst16 Mar 18 '13 at 18:22