0

I have software that loads a list of up to 10,000 URLs which are used to scrape insurance prices for my website.

I have a single thread running at the moment per request which loads each URL from the list and fetches the data. What I want to do is run 20-30 requests per time. What's the best way to launch 20-30 threads at once whilst looping through the results from the textfile?

James Jeffery
  • 2,011
  • 4
  • 19
  • 15
  • You could load the entire list at one go and hand off a chunk from it (say 50 URLs) to each new thread that you spawn, till you reach the max thread count (say 20 threads). Tweak the numbers as necessary. – Bhargav Feb 28 '12 at 14:34
  • I would probably scale back your ambitions, with that number of outbound request any website will rack up the bandwidth charges at an astronomical rate. – Lloyd Feb 28 '12 at 14:40
  • May be of interest: http://stackoverflow.com/questions/8853907/limit-the-number-of-parallel-threads-in-c-sharp/8853978#8853978 – Jeb Feb 28 '12 at 14:45

2 Answers2

3

Take a look at the Task Parallel Library and especially the Parallel.ForEach method.

Matthias
  • 12,053
  • 4
  • 49
  • 91
1

If you are on .NET 4 then you can take a look at TPL and something like the following.

const string path = @"c:\urls.txt";
string[] urls = File.ReadAllLines(path);            

var options = new ParallelOptions() 
              { MaxDegreeOfParallelism = 20};

Parallel.ForEach(urls, options, url =>
            {
                // Call your scraper here
                Debug.WriteLine(url);
            });
Jonas Elfström
  • 30,834
  • 6
  • 70
  • 106