According to the documentation, DUPEFILTER_CLASS
is already set to scrapy.dupefilter.RFPDupeFilter
by default.
RFPDupeFilter
doesn't help if you stop the crawler - it only works while actual crawling, helps you to avoid scraping duplicate urls.
It looks like you need to create your own, custom filter based on RFPDupeFilter
, like it was done here: how to filter duplicate requests based on url in scrapy. If you want your filter to work between scrapy crawl sessions, you should keep the list of crawled urls somewhere in the database, or csv file.
Hope that helps.