I've got some problems with scraping https://www.autotrader.co.uk/ using framework scrapy: 403(forbidden) or 200 with unworking def parse

Question

When I firstly tried to scrape the autotrader - got an error 403

14 18:08:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.autotrader.co.uk/car-search?postcode=ec1a1aa&make=TESLA>: HTTP status code is not handled or not allowed

This problem was resolved by using scrapy-user-agent. I got:

020-07-21 21:25:19 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
2020-07-21 21:25:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.autotrader.co.uk/robots.txt> (referer: None)
2020-07-21 21:25:19 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.autotrader.co.uk/car-search?sort=relevance&postcode=ec1a1aa&radius=1500&make=TESLA&page=1>
2020-07-21 21:25:20 [scrapy.core.engine] INFO: Closing spider (finished)

200 - seemed to be good but ... def parse that is responsible for scraping doesn't work. That's my example code:

import scrapy


class AutotraderCoUkSpider(scrapy.Spider):
    name = 'autotrader_co_uk'
    print('ok')  
    start_urls = ['https://www.autotrader.co.uk/car-search?sort=relevance&'
                  'postcode=ec1a1aa&radius=1500&make=TESLA&page=1']  

    def parse(self, response):
        print(self.name)

        # example
        pages_count_tag = response.css('li.paginationMini__count::text').getall()
        yield {'text': pages_count_tag}

In an output I get 'ok' but there is no self.name = 'autotrader_co_uk' that means def parse non-working. I tried scrapy-proxy-pool but it doesn't help. Please, no beautifil soup - I need help exactly with scrapy

Does this answer your question? [getting Forbidden by robots.txt: scrapy](https://stackoverflow.com/questions/37274835/getting-forbidden-by-robots-txt-scrapy) — sal, Jul 21 '20 at 21:01
Try to keep your question as short and to the point as possible. This makes it easier to find when people are searching this site. Then you can explain in the post itself al necessary details. And welcome on StackOverflow ! — Ronald, Jul 21 '20 at 21:01

I've got some problems with scraping https://www.autotrader.co.uk/ using framework scrapy: 403(forbidden) or 200 with unworking def parse

0 Answers0