Ok, I will try to explain my problem as best as I can, from the code snippet below I’m passing tempURLTestedVsCaptured, to my method CheckForDuplicates_capturedUrls.
This method will check if there are any duplicates, if there are not, it will add the URL (contained within my URLs object) to a new URLs object. Then once done, it will set the original tempURLs to the reference of the new object.
The problem I have tempURLTestedVsCaptured is not getting the new reference. If I watch the tempURLs, it has the correct value at the end of the method, when it jumps out back to the Crawl method, the tempURLTestedVsCaptured has returned to the original value.
If I change tempURLs, for example adding a URL to it, the changes are reflected.
If I do:
tempURLs = new URLs();
tempURLs = processedURLs;
It won’t pick up the change. I’m clearly missing something very fundermental here in my learning, but I can’t put my finger on it.
private void CheckForDuplicates_capturedUrls(URLs tempURLs)
{
URLs unprocessedURLs = (URLs)tempURLs;
URLs processedURLs = new URLs();
foreach (URL url in unprocessedURLs)
{
if (!crawlContext.capturedUrls.ContainsURL(url))
{
processedURLs.AddURL(url);
}
}
tempURLs = new URLs();
tempURLs = processedURLs;
}
private void Crawl(WebScraper_Context crawlContext)
{
URLs tempURLTestedVsVisited = new URLs();
URLs tempURLTestedVsCaptured = new URLs();
while (crawlContext.unVistedURLs.Count() != 0) //While we have URLS we not visited, continue
{
foreach (URL url in crawlContext.unVistedURLs)
{
// If we not visted the page yet
if (!crawlContext.vistedURLs.ContainsURL(url)) // Vist the URL if there is one
{
crawlContext.vistedURLs.AddURL(url);
LoadPage(url.url);
doc = GetSubSetXPath(doc, crawlContext.xPath);
}
if (doc != null)
{
crawlContext.scrapedUrls = ScrapeURLS();
crawlContext.scrapedUrls = GetLocalUrls(crawlContext.scrapedUrls);
// Cache the URLS into, so we can check if we seen them before
foreach (URL newURL in crawlContext.scrapedUrls)
{
if (!tempURLTestedVsVisited.ContainsURL(newURL))
{
tempURLTestedVsVisited.AddURL(newURL);
tempURLTestedVsCaptured.AddURL(newURL);
}
else
{
System.Windows.Forms.MessageBox.Show("Duplicate URL found in scraped URLS");
}
}
**this.CheckForDuplicates_capturedUrls(tempURLTestedVsCaptured);**
foreach (URL newURL in crawlContext.scrapedUrls)
{
if (tempURLTestedVsVisited.ContainsURL(newURL) && tempURLTestedVsCaptured.ContainsURL(newURL))
{
crawlContext.newURLs.AddURL(newURL);
crawlContext.capturedUrls.AddURL(newURL);
}
}
}
}
crawlContext.unVistedURLs = new URLs(); crawlContext.unVistedURLs = crawlContext.newURLs;
crawlContext.newURLs = new URLs();
}
if (RequestStop == true)
{
RequestStop = false;
}
System.Windows.Forms.MessageBox.Show("Complete");
}
Ok T. Kiley completely explains my problem and why I’m getting it. The reason I’m not returning URLS, and I’m doing a pointless cast, is because the method signature is planned to be:
private void CheckForDuplicates_capturedUrls(object tempURLs).
The method is going to be used as a thread start “DuplicateCheckerB = new Thread(this.CheckForDuplicates_capturedUrls);” and “DuplicateCheckerA.Start(tempURLTestedVsVisited);” I originally thought my problem was down to threading, so I stripped it in in process of debugging.
Now, would I be right in thinking that I have to modify the actual object to remove the URLs if I am going to pass it to thread?