I want to know how Scrapy filters those crawled urls? Does it store all urls which are crawled in something like crawled_urls_list
, and when it get a new url it looks up the list to check if the url exists ?
Where are the codes of this filtering part of CrawlSpider(/path/to/scrapy/contrib/spiders/crawl.py) ?
Thanks a lot!