0

I'm trying to crawl a page but within that page I need to press a button numerous times to load all of the contents which is why I'm using selenium before parsing it and extract the links.

Below is the error, what am I doing wrong?

2018-08-31 20:18:56 [twisted] CRITICAL:
Traceback (most recent call last):
  File "d:\python-projects\lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "d:\python-projects\lib\site-packages\scrapy\crawler.py", line 81, in crawl
    start_requests = iter(self.spider.start_requests())
TypeError: 'NoneType' object is not iterable

my code:

import scrapy
from scrapy.selector import Selector
from scrapy.spider import Spider
from scrapy.utils.markup import remove_tags
from selenium import webdriver


class Listings(Spider):
    name = "adver"
    base_url = 'https://www.test.com/xxxxx1'

    def start_requests(self):
        self.driver = webdriver.Firefox(executable_path=r'D:\python-projects\geckodriver.exe')
        self.driver.get(self.base_url)
        while True:
            load_content = self.driver.find_element_by_xpath('/html/body/div[5]/div[3]/div[1]/button')
            try:
                self.parse(driver.page_source)
                load_content.click()
            except:
                break
        self.driver.close()


    def parse(self, response):
        for link in response.css ("a.ad-title-link"):
            ad_link = link.css('a::attr(href)').extract_first()
            yield {'link': ad_link}
ou_ryperd
  • 2,037
  • 2
  • 18
  • 23
mrWiga
  • 131
  • 1
  • 2
  • 13
  • Should `start_requests` return something? – Andersson Aug 31 '18 at 10:44
  • start_requests should just extract the HTML and pass it on to self.parse(driver.page_source) – mrWiga Aug 31 '18 at 10:47
  • Going by full XPath without using a class/id or anything unique to access the button is pretty bad since any element that will be added/changed in the tree will cause an exception. Try to check if the load_button has a unique class or id and use it, you can extract some of the html here and we can help with that. – Shlomi Bazel Aug 31 '18 at 17:40

1 Answers1

0

You need to pass selenium response to Scrapy parse callback

I recommend you to use Scrapy with Selenium downloader middleware

gangabass
  • 10,607
  • 2
  • 23
  • 35