Scrapy empty information on a website with protection

Question

I am struggling on a website to get some information, I set up ROBOTSTXT_OBEY = False but still doesnt retrieve any information, how to fix it?

start_urls = ['https://tienda.mercadona.es/search-results?query=leche%20entera']

def parse(self, response):
    sample = response.css("div").get()
    yield {'name':sample}

Thank you so much, as far as I see, probably they have something to forbid me when I do the request

Does this answer your question? [Can scrapy be used to scrape dynamic content from websites that are using AJAX?](https://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax) — Alexander, Oct 28 '22 at 20:24

score 1 · Accepted Answer · answered Oct 30 '22 at 01:09

1

The site you are trying to scrape is dynamically loaded with JavaScript. Vanilla Scrapy won't handle javascript by default but there are plugins that may help. A simple one that comes to mind is Scrapy-Playwright. Once configured properly it usually just requires adding DOWNLOAD_HANDLERS to the settings.py file like so:

DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",

}

You will then need to pass meta={"playwright":True} as an argument within the scrapy Request.

answered Oct 30 '22 at 01:09

E Joseph

316
2
8

what about twisted reactor? should i install it? Thank you so much – M. Mariscal Oct 31 '22 at 06:06
also i am not doing any request on the scrapy, its every on parse() method – M. Mariscal Oct 31 '22 at 06:18
Twisted reactor is used by vanilla scrapy and won't change anything. You need scrapy-playwright because you need the page to be loaded with javascript before being sent back in the response object. the parse() method receives the response object. – E Joseph Oct 31 '22 at 14:05
sorry its not enough this code and I couldnt manage to get the info, at the end I did it by requests library, thank you anyway – M. Mariscal Nov 01 '22 at 06:50

Scrapy empty information on a website with protection

1 Answers1