When I firstly tried to scrape the autotrader - got an error 403
14 18:08:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.autotrader.co.uk/car-search?postcode=ec1a1aa&make=TESLA>: HTTP status code is not handled or not allowed
This problem was resolved by using scrapy-user-agent. I got:
020-07-21 21:25:19 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
2020-07-21 21:25:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.autotrader.co.uk/robots.txt> (referer: None)
2020-07-21 21:25:19 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://www.autotrader.co.uk/car-search?sort=relevance&postcode=ec1a1aa&radius=1500&make=TESLA&page=1>
2020-07-21 21:25:20 [scrapy.core.engine] INFO: Closing spider (finished)
200 - seemed to be good but ... def parse that is responsible for scraping doesn't work. That's my example code:
import scrapy
class AutotraderCoUkSpider(scrapy.Spider):
name = 'autotrader_co_uk'
print('ok')
start_urls = ['https://www.autotrader.co.uk/car-search?sort=relevance&'
'postcode=ec1a1aa&radius=1500&make=TESLA&page=1']
def parse(self, response):
print(self.name)
# example
pages_count_tag = response.css('li.paginationMini__count::text').getall()
yield {'text': pages_count_tag}
In an output I get 'ok' but there is no self.name = 'autotrader_co_uk' that means def parse non-working. I tried scrapy-proxy-pool but it doesn't help. Please, no beautifil soup - I need help exactly with scrapy