4

I have a method that attempts to download data from several URLs in Parallel, and return an IEnumerable of Deserialized types

The method looks like this:

    public IEnumerable<TContent> DownloadContentFromUrls(IEnumerable<string> urls)
    {
        var list = new List<TContent>();

        Parallel.ForEach(urls, url =>
        {
            lock (list)
            {
                _httpClient.GetAsync(url).ContinueWith(request =>
                {
                    var response = request.Result;
                    //todo ensure success?

                    response.Content.ReadAsStringAsync().ContinueWith(text =>
                    {
                        var results = JObject.Parse(text.Result)
                            .ToObject<IEnumerable<TContent>>();

                        list.AddRange(results);
                    });
                });
            }
        });

        return list;
    }

In my unit test (I stub out _httpClient to return a known set of text) I basically get

Sequence contains no elements

This is because the method is returning before the tasks have completed.

If I add .Wait() on the end of my .ContinueWith() calls, it passes, but I'm sure that I'm misusing the API here...

Alex
  • 37,502
  • 51
  • 204
  • 332
  • Are you sure you want to use GetAsync from a Parallel? The foreach is already parallelized, right? – Wouter Simons Oct 15 '13 at 14:40
  • 1
    Do you want to block the call to your `DownloadContentFromUrls` function until every download has been completed? If so then you don't really need ContinueWith, you can just access the Result property with a statement like `var results = _httpClient.GetAsync(url).Result;` – Trevor Elliott Oct 15 '13 at 14:41
  • well, thats what i thought... there isn't a "GetSync" though..? – Alex Oct 15 '13 at 14:42
  • @TrevorElliott that's an excellent point, i think i'm getting confused around what i need to block and what i don't.... – Alex Oct 15 '13 at 14:44
  • 1
    Although this isn't your question, do you really want to `lock` list that far up. I think it will interfere with your concurrency. Why not `lock` it right before `List.AddRange`? – Harrison Oct 15 '13 at 14:45
  • @Harrison Actually, that lock is held only for a short time, because the lambda executes asynchronously. But this also means that the list won't be locked when `AddRange()` is called. – svick Oct 15 '13 at 18:29

1 Answers1

9

If you want a blocking call which downloads in parallel using the HttpClient.GetAsync method then you should implement it like so:

public IEnumerable<TContent> DownloadContentFromUrls<TContent>(IEnumerable<string> urls)
{
    var queue = new ConcurrentQueue<TContent>();

    using (var client = new HttpClient())
    {
        Task.WaitAll(urls.Select(url =>
        {
            return client.GetAsync(url).ContinueWith(response =>
            {
                var content = JsonConvert.DeserializeObject<IEnumerable<TContent>>(response.Result.Content.ReadAsStringAsync().Result);

                foreach (var c in content)
                    queue.Enqueue(c);
            });
        }).ToArray());
    }

    return queue;
}

This creates an array of tasks, one for each Url, which represents a GetAsync/Deserialize operation. This is assuming that the Url returns a Json array of TContent. An empty array or a single member array will deserialize fine, but not a single array-less object.

Trevor Elliott
  • 11,292
  • 11
  • 63
  • 102
  • Is that any different to using Paralell.ForEach with .Result as talked about in comments? - looking at this - http://stackoverflow.com/questions/5009181/parallel-foreach-vs-task-factory-startnew – Alex Oct 15 '13 at 15:17
  • They are the same but the benefit of using an array of tasks is you can simply change WaitAll to WhenAll and have your outer method return asynchronously. You can also make a cancellable chain of calls. eg. when your method is cancelled it can easily cancel the underlying HttpClient operations. – Trevor Elliott Oct 15 '13 at 15:31
  • Parallel.ForEach is better for computational problems. – Trevor Elliott Oct 15 '13 at 15:32
  • I get a compilation error on the .Select for IEnumerable urls - The type arguments for method 'System.Linq.Enumerable.Select(System.Collections.Generic.IEnumerable, System.Func)' cannot be inferred from the usage. Try specifying the type arguments explicitly. – Alex Oct 16 '13 at 16:17
  • Compiles fine for me. Are you sure you copied it exactly? – Trevor Elliott Oct 16 '13 at 16:29
  • 1
    I tested it with both .NET 4.5 and 4.0. Usually that error means you are missing the `return` keyword. I can't see it not working if you copy and paste it. – Trevor Elliott Oct 16 '13 at 17:13
  • was indeed missing the return. Thanks – Alex Oct 17 '13 at 08:29
  • +1 Perfect! Though I had to change IEnumerable to TContent to make it work in my case – Rashmi Pandit Nov 06 '14 at 21:31
  • thanks for sharing this, is there a limit to the number of connections you can run simultaneously this way when using HttpClient? will the instance of the HttpClient be shared without issue across simultaneous requests? if there are too many will they just queue? – SelAromDotNet Aug 05 '17 at 02:28