I having a hard time understanding multithreading and parallel programming. I have a small application (Scraper). I am using Selenium with C# .NET. I have a file that contains addresses from business. I then use my scraper to look for company name and their website. After that I do another scraping for generic email address based on their company site
Here is the issue. If I do this manually it will take me 3 years to complete a 50,000 records. I made the math. Lol. That's why I created the scraper. A normal console application took 5 to 6 days to complete. Then, I decided maybe using multithreading and parallel programming could reduce the time.
So, I did a small sample test. I noticed that 1 record took 10 sec. To finish. Then with 10 record it took 100 sec. My question is why multithreading took the same time?
I am not sure if my expectations and understanding of multithreading is wrong. I thought by using Parallel.ForEach
will launch all ten record and finish at 10 sec saving me 90 sec. Is this the correct assumption? Can someone please clarify me how actually multithreading and parallel programming works?
private static List<GoogleList> MultiTreadMain(List<FileStructure> values)
{
List<GoogleList> ListGInfo = new List<GoogleList>();
var threads = new List<Thread>();
Parallel.ForEach (values, value =>
{
if (value.ID <= 10)
{
List<GoogleList> SingleListGInfo = new List<GoogleList>();
var threadDesc = new Thread(() =>
{
lock (lockObjDec)
{
SingleListGInfo = LoadBrowser("https://www.google.com", value.Address, value.City, value.State,
value.FirstName, value.LastName,
"USA", value.ZipCode, value.ID);
SingleListGInfo.ForEach(p => ListGInfo.Add(p));
}
});
threadDesc.Name = value.ID.ToString();
threadDesc.Start();
threads.Add(threadDesc);
}
});
while (threads.Count > 0)
{
for (var x = (threads.Count - 1); x > -1; x--)
{
if (((Thread)threads[x]).ThreadState == System.Threading.ThreadState.Stopped)
{
((Thread)threads[x]).Abort();
threads.RemoveAt(x);
}
}
Thread.Sleep(1);
}
return ListGInfo;
}