I'm sort of making a 'web parser', except it is just for 1 website, which will be parsing many different pages all at one time.
Currently, There may be 300,000 pages that I need to parse, in a relatively fast manner (I'm only grabbing a tiny amount of information that doesn't take too long to go through, every page takes about ~3 seconds max on my network). Of course, 900,000 seconds to days = 10 days, and that is terrible performance. I would like to reduce this to a couple of hours at the most, I'm reasonable with the time to amount of requests, but it still needs to be 'fast'. I also know that I can't just do 300,000 all at one time, or the website will block all of my requests, so there will have to be a few seconds delay in between each and every request.
I currently have it processing in a single foreach loop, not taking advantage of any multithreading what so ever, but I know that I could take advantage of it, I'm not sure about what path I should take whether it be threadpools, or another type of threading system or design.
Basically, I'm looking for someone to point me in the right direction of efficiency using multithreading so that I can ease the time it will take to parse that many pages on my end, some sort of system or structure for threading.
Thanks