scrapy shell: I only see spider opened and then I get a time out for Zalando pages

Question

When I am in scrapy shell and I run:

fetch('https://www.google.nl')

Then I get a normal response:

2020-11-19 12:42:00 [scrapy.core.engine] INFO: Spider opened
2020-11-19 12:42:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.google.nl> (referer: None)

But when I do this for Zalando pages, for example:

fetch('https://www.zalando.de/nike-sportswear-pant-jogginghose-ni121a09o-c11.html')

Then I only see:

2020-11-19 12:46:06 [scrapy.core.engine] INFO: Spider opened

And after a while I get a timeout. Why is this not working for Zalando pages? Or: what should I change to make this work?

After getting the helpful answer, I realized this question + answers is related: https://stackoverflow.com/questions/25429671/scrapy-shell-how-to-change-user-agent/40136365#40136365 — Sander van den Oord, Nov 19 '20 at 13:53

score 2 · Accepted Answer · answered Nov 19 '20 at 13:13

2

Include a User Agent in your Request's headers, this worked fine for me:

from scrapy import Request
url='https://www.zalando.de/nike-sportswear-pant-jogginghose-ni121a09o-c11.html'
req = Request(url, headers={
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0'
})
fetch(req)

Could be a anti-bot measure

answered Nov 19 '20 at 13:13

renatodvc

2,526
2
6
17

1

Thanks worked perfectly :) I added still an Accept-Language to your headers: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36', 'Accept-Language': 'de'} – Sander van den Oord Nov 19 '20 at 13:40

scrapy shell: I only see spider opened and then I get a time out for Zalando pages

1 Answers1