I'm re-writing the spidering/crawler portion of a Delphi 6 site mapper application that I previously wrote. The app spiders a single site.
I need to manage two aspects of this:
- A Queue for URLs to scan, first in, first out.
- A Scanned list of URLs so that links from a new page are not added to the queue if they were already visited. This list would need to be searched.
Previously these were done with a TList and a StringList respectively. Obviously the performance of these degraded on sites with thousands of links.
My question is, what should be used for these queues/list to ensure the best performance? I have little experience with hashes.