I'm using RCrawler to crawl ~300 websites. The size of websites is quite diverse: some are small (dozen or so pages) and others are large (1000s pages per domain). To crawl the latter is very time-consuming, and - for my research purpose - the added value of more pages when I already have a few hundred, decreases.
So: is there a way to stop the crawl if an x number of pages is collected?
I know I can limit the crawl with MaxDepth, but even at MaxDepth=2, this is still an issue. MaxDepth=1 is not desirable for my research. Also, I'd prefer to keep MaxDepth high, so the smaller websites do get crawled completely.
Thanks a lot!