I need to download a lot of pages through proxies. What is best practice for building a multi-threaded web crawler?
Is Parallel.For\Foreach is good enough or is it better for heavy CPU tasks?
What do you say about following code?
var multyProxy = new MultyProxy();
multyProxy.LoadProxyList();
Task[] taskArray = new Task[1000];
for(int i = 0; i < taskArray.Length; i++)
{
taskArray[i] = new Task( (obj) =>
{
multyProxy.GetPage((string)obj);
},
(object)"http://google.com"
);
taskArray[i].Start();
}
Task.WaitAll(taskArray);
It's working horribly. It's very slow and I don't know why.
This code is also working bad.
System.Threading.Tasks.Parallel.For(0,1000, new System.Threading.Tasks.ParallelOptions(){MaxDegreeOfParallelism=30},loop =>
{
multyProxy.GetPage("http://google.com");
}
);
Well i think that i am doing something wrong.
When i starting my script it use network only at 2%-4%.