Scrapy: Wait for Request to finish before executing next one

Question

I have the following Spider which basically requests the start_urls and for every URL in there it has to do many sub Requests.

def parse(self, response): 
    print(response.request.headers['User-Agent'])

    for info in response.css('div.infolist'):

        item = MasterdataScraperItem()
        
        info_url = BASE_URL + info.css('a::attr(href)').get() # URL to subpage
        print('Subpage: ' + info_url)
    
        item['name'] = info.css('img::attr(alt)').get()
        
        yield scrapy.Request(info_url, callback=self.parse_info, meta={'item': item})

The for loop in the code above runs around 200 times and after around 100 iterations I get the HTTP Code 429.

My idea was to set DOWNLOAD_DELAY to 3.0 but this somehow has not applied to the loop and scrapy.Request is just called directly a few hundred times.

Is there a way to wait n-seconds before the the next iteration of scrapy.Requests is called?

Does this answer your question? [How to give delay between each requests in scrapy?](https://stackoverflow.com/questions/8768439/how-to-give-delay-between-each-requests-in-scrapy) — Kulasangar, Jan 05 '23 at 11:42
@Kulasangar No, I have mentioned that I have tried it with DOWNLOAD_DELAY but it's not getting applied to scrapy.Request — csphmay, Jan 05 '23 at 11:49
@Alexander concurrent_requests is set to 1 and autothrottle is enabled — csphmay, Jan 06 '23 at 09:45

score 0 · Answer 1 · answered Jan 05 '23 at 14:05

0

You can limit the number of requests handled by the downloader at the same time using CONCURRENT_REQUESTS

class MySpider(scrapy.Spider):
    custom_settings = {
        "CONCURRENT_REQUESTS": 1,
    }
    # Rest of code

answered Jan 05 '23 at 14:05

zaki98

1,048
8
13

already set CONNCURENT_REQUESTS to 1 – csphmay Jan 06 '23 at 10:31

Scrapy: Wait for Request to finish before executing next one

1 Answers1