I am using Scrapy version 1.5.1. I crated parser, which parse urls from main page, then parse urls from already parsed urls, etc. Scrapy works asynchronously and makes parallel connections. The problem is, that I have some logic which urls should parse first, creating sets of urls which I already visited, maximum urls to visit etc.
At first, I configure CONCURRENT_REQUESTS_PER_DOMAIN=1
and CONCURRENT_REQUESTS=1
, but it did not help, because I think there is scheduler that cache url which will process next and then, perform it in different order.
What I need to do, is to force scrapy to process one url and wait till it is finished and then, start parsing new url, etc. Is there a way how to configure scrapy to do this?