1

So I am trying to query an API that's accessible via HTTP ( no authorization ). To speed things up, I tried to use a Parallel.ForEach loop but it seems like the longer it runs, the more errors pop up.

It fails to retrieve more and more requests. I know the API provider isn't limiting me because I can request the very same blocked URLs in my Internet browser. Also, these are different failed URLs each time, so it doesn't seem to be the case of malformed requests.

The error doesn't seem to occur while I use single threaded foreach loop.

My malfunctioning loop is below:

Parallel.ForEach(this.urlArray, singleUrl => {
this.apiResponseBlob = new System.Net.WebClient ().DownloadString(singleUrl );
this.responsesDictionary.Add(singleUrl, apiResponseBlob);
}

Normal foreach loop works fine but is very slow:

foreach (string singleUrl in this.urlArray) {
this.apiResponseBlob = new System.Net.WebClient ().DownloadString(singleUrl);
this.responsesDictionary.Add(singleUrl, apiResponseBlob);
}

Also: I've had a solution in PHP - I spawned several "fetchers" simultaneously and it never hung up. It seems strange to me that PHP would handle multithreaded retrieval better than C# so I must obviously miss something.

How do I query the API fastest way? Without these strange failures?

gggggggg5555
  • 85
  • 1
  • 2
  • 8
  • 1
    Wouldn't it be easier to use the [async](http://msdn.microsoft.com/en-us/library/system.net.webclient.downloadstringasync(v=vs.110).aspx) version of that call? – rene Dec 08 '14 at 12:44
  • You mean together with Parrarel.ForEach or normal ForEach loop? – gggggggg5555 Dec 08 '14 at 12:58
  • 1
    with the normal foreach and let the WbClient instance handle the completion – rene Dec 08 '14 at 12:59

1 Answers1

2

Hi did you try to speed up your code with a sync downloads like in this question (see marked answer):

DownloadStringAsync wait for request completion

your could loop through your uris and get a callback for each successfull download.

EDIT : i have seen that you use

this.apiResponseBlob = DL

when you use multithreading every thread tries to write in that variable. This could be a reason vor your bug. Try using an instance of that object type or use

lock{}

so that only one thread can write this variable at time. http://msdn.microsoft.com/de-de/library/c5kehkcz.aspx

like

    Parallel.ForEach(this.urlArray, singleUrl => {
    var apiResponseBlob = new System.Net.WebClient ().DownloadString(singleUrl );
    lock(singleUrl.ToString()){
    this.responsesDictionary.Add(singleUrl, apiResponseBlob);
}
    }
Community
  • 1
  • 1
Bjego
  • 665
  • 5
  • 14
  • What about when the WebClient is inside other class? Can I still use async by passing this class just one url? – gggggggg5555 Dec 08 '14 at 13:13
  • 1
    yes you can. but you have to use an event - for the response. Something like this. e.g. (written from my mind ;) not proofed public class MyDownloader{ public event EventHandler DlFinished; public void DLAsync(uri url){ var client = new WebClient(); client.DownloadStringCompleted += (sender, e) => { doSomeThing(e.Result); this.DlFinished(null,null); }; client.DownloadStringAsync(uri); }} usage from other class: MyDownloader loader = new MyDownloader; loader.DlFinished+=CallbackFunction; loader.DlAsync(uri); – Bjego Dec 08 '14 at 13:22
  • 1
    hope this is readable ;) more info here about custom events: http://stackoverflow.com/questions/6644247/simple-custom-event – Bjego Dec 08 '14 at 13:29