2

I want to download webPages content of url list (10 000 urls).

  1. Is httpCLient the fastest and cleanest way (instead httpwebrequest, or webclient)?
  2. If I want to be fast, Is TPL the best way ?

I'm looking for something like, but really fast and clean (10 000 request) ?

public List<string> GetContentListOfUrlList(List<Uri> uriList, int maxSimultaneousRequest)
    {
        //requesting url by the fastest way

    }

I hope is better like this ;)

EDIT 2 : According to noseratio other post Is the best solution ?

public async Task<List<string>> DownloadAsync(List<Uri> urls, int maxDownloads)
    {
        var concurrentQueue = new ConcurrentQueue<string>();

        using (var semaphore = new SemaphoreSlim(maxDownloads))
        using (var httpClient = new HttpClient())
        {
            var tasks = urls.Select(async (url) =>
            {
                await semaphore.WaitAsync();
                try
                {
                    var data = await httpClient.GetStringAsync(url);
                    concurrentQueue.Enqueue(data);
                }
                finally
                {
                    semaphore.Release();
                }
            });

            await Task.WhenAll(tasks.ToArray());
        }
        return concurrentQueue.ToList();
    }

Questions

  1. configureAwait? Should I use

    var data = await httpClient.GetStringAsync(url).ConfigureAwait(false);

var data = await httpClient.GetStringAsync(url);

  1. ServicePointManager.DefaultConnectionLimit? Should I change this property as well?
Community
  • 1
  • 1
Julian50
  • 2,462
  • 1
  • 21
  • 30
  • possible duplicate of [How can I limit Parallel.ForEach?](http://stackoverflow.com/questions/9290498/how-can-i-limit-parallel-foreach) – joell Jan 27 '15 at 07:45
  • 1
    Please clarify you requestion, perhaps with a code example. Currently, i have no clue what you *actually* want to do. – Yuval Itzchakov Jan 27 '15 at 07:48
  • @iVision I edited my question to be clearer – Julian50 Jan 27 '15 at 08:01
  • Don't use TPL for naturally async I/O-bound tasks, rather use async APIs. E.g.: http://stackoverflow.com/a/22493662/1768303 – noseratio Jan 27 '15 at 08:09
  • @Noseratio `async/await` is more-or-less syntactic sugar over TPL. In this particular case, the aim is to fire many requests concurrently, not await them one by one. At best, after calling `GetStringAsync` 1000 times, you get to `await Task.WhenAll(theCalls)` – Panagiotis Kanavos Jan 27 '15 at 08:11
  • 1
    @PanagiotisKanavos, feel free to click the link I posted to see what I meant. It doesn't do it one by one. It's about using *a naturally async, non-blocking API* like `HttpClient.GetStringAsync` - as opposed to calling a blocking API like `WebClient.DownloadString` in parallel with `Parallel.ForEach`. – noseratio Jan 27 '15 at 08:18
  • 2
    @PanagiotisKanavos *async/await is more-or-less syntactic sugar over TPL* Not really. `async-await` isn't bound to the TPL at all, and has nothing to do with parallalism. – Yuval Itzchakov Jan 27 '15 at 08:18
  • @Julian50 What have you tried so far? You don't simply request for code, you need to show an attempt to begin with. – Yuval Itzchakov Jan 27 '15 at 08:19
  • @Julian50, to address your edit, it's just one way of doing that. You could also use TPL Dataflow as [another answer](http://stackoverflow.com/a/22492731/1768303) suggests. Or, you could also use Reactive Extensions. We can't tell you what's best for you. – noseratio Jan 27 '15 at 09:19

1 Answers1

1

There is a ParallelOptions.MaxDegreeOfParallelism Property which specifies the maximum number of concurrent operations:

Parallel.ForEach(list, 
        new ParallelOptions { MaxDegreeOfParallism = 4 }, 
        DownloadPage);

Reference: MaxDegreeOfParallism

joell
  • 396
  • 6
  • 17