I wrote a program that spawns a user chosen number of threads that each crawl the internet in search of some data, you might call it a webcrawler.
The bottleneck of the program should definitely be network capacity since any given thread spends the majority of it's time waiting on network requests:
WebClient client = new WebClient();
string url = "http://averynice.web.api?x=2d2d2&?y=dwdwdw";
string response = client.DownloadString(url)
The problem I am experiencing is that the program will reach it's peak speed (in terms of how many web-pages it has processed) if I make it spawn about 20 threads, that speed being about 1,000 pages per minute. Any more threads than that and it's speed becomes correlated negatively to how many threads I add.
On the other hand, if I launch 10 or even 20 separate instances of the program and spawn 20 threads into each, all instances of the program will reach the same top speed resulting in a cumulative speed of 1000 per minute * number of program instances running.
I read here on stackoverflow that:
Both processes and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in a shared memory space, while processes run in separate memory spaces.
So I figure the problem is in the size of the shared memory space, but how do I change that so that I could have a single instance running as many threads as my network capacity will handle?
If the problem isn't shared memory space then what is the limiting factor/bottleneck and how might I work around it?
Thanks in advance for any help or suggestions :).