0

I am trying to deploy my scrapy-spider on my local host through scrapyd, The spider script contains the selenium module to perform some web automated tasks. the problem arises when I try to deploy it.

After running the scrapyd from command line to run the local host. I type the local host address on my browser and its online and listening. Then type in the the scrapyd-deploy command on another cmd window and its gets stuck like this for hours(no error msg).

$ scrapyd-deploy local
Packing version 1560251984
Deploying to project "crawler" in http://localhost:6800/addversion.json

stuck image description

I'm using gitbash on my windows machine by the way, I've also tried using normal cmd but it still the same endless wait and delay.

On the first cmd window where i run the scrapyd command to open the local host, i get something like this.

DevTools listening on ws://127.0.0.1:9137/devtools/browser/07f179fa-02ce-4b31-a5
96-9b700654f105

devtools listening stuck image

To my understanding that seems to be the selenium browser in headless mode trying to initiate but it keeps waiting endlessly.

When i open my project directory, i see new folders like eggs, project-egg info and build. It seems it eggifies the project but runs through a delay when trying to deploy and run it on local host.

this is my spider script

import scrapy
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait as Webwait
from selenium.webdriver.support import expected_conditions as exco
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import random

C_options = Options()
C_options.add_argument("--disable-extensions")
C_options.add_argument("--disable-gpu")
C_options.add_argument("--headless")

class CloupAppSpider(scrapy.Spider):
    driver = webdriver.Chrome(options=C_options,
                              executable_path=r"C:\.....\chromedriver.exe")
    driver.get("https://scrapingclub.com/exercise/basic_login/")
    cookie = driver.get_cookies()
    driver.add_cookie(cookie[0])
    name = 'crawl'
    allowed_domains = ['scrapingclub.com']
    start_urls = ['https://scrapingclub.com/exercise/basic_login/']

    def __int__(self, name=None, passwd=None, *args, **kwargs):
        super(CloupAppSpider, self).__init__(*args, **kwargs)
        self.passwd = passwd
        self.name = name

    def parse(self, response):
        pword = self.passwd
        uname = self.name
        Webwait(self.driver, 10).until(exco.presence_of_element_located((By.ID, "id_name")))
        Webwait(self.driver, 10).until(exco.presence_of_element_located((By.ID, "id_password")))
        CloupAppSpider.driver.find_element_by_id("id_name").send_keys(pword)
        CloupAppSpider.driver.find_element_by_id("id_password").send_keys(uname)
        CloupAppSpider.driver.find_element_by_css_selector(".btn.btn-primary").click()
        Webwait(self.driver, 10).until(exco.presence_of_element_located((By.CLASS_NAME, "col-lg-8")))
        html = CloupAppSpider.driver.execute_script("return document.documentElement.outerHTML")
        bs_obj = BeautifulSoup(html, "html.parser")
        text = bs_obj.find("div", {"class": "col-lg-8"}).find("p")
        obj = text.get_text()
        obj = obj + str(random.randint(0, 100))
        self.driver.close()
        yield{
            'text': obj
        }

This is my scrapy.cfg content

[settings]
default = crawler.settings
[deploy:local]
url = http://localhost:6800/
project = crawler

Can someone help explain to me where i went wrong, I am clueless as i didn't get any error code when deploying. it just keeps waiting endlessly, my guess is when it's trying to process with selenium.

  • If you run `http://localhost:6800` what happens? – Umair Ayub Jun 12 '19 at 06:35
  • @Umair Initially the local host opens after the scrapyd command in cmd, but after I try to deploy it with (scrapyd-deploy local), then It doesn't open any more, the web page keeps loading endlessly also. – Abraham Michael Jun 12 '19 at 12:19
  • is your scrapyd running? or something else is running on that port? – Umair Ayub Jun 12 '19 at 12:19
  • @Umair after i type the scrapyd command on cmd, it opens the local host and runs it on the webpage. but after i try to deploy with (scrapyd-deploy) the local host web page doesn't open anymore, it keeps loading. I don't have any knowledge of something else on that port, apart from scrapyd. – Abraham Michael Jun 12 '19 at 12:28
  • is your scrapyd and the project you are working on are both on same machine? probably set `bind-address=0.0.0.0` in your `scrapyd.conf` file – Umair Ayub Jun 12 '19 at 12:29
  • okay i will try your suggestion now, thanks for the help so far. I'm also running the project and scrapyd on the same machine – Abraham Michael Jun 12 '19 at 12:34
  • @Umair I tried it and now I'm getting an error at least, in my scrapyd cmd window, DevTools listening on ws://127.0.0.1:22265/devtools/browser/a472c99f-a0c1-47d5-b b21-11975b70743d [0612/155022.106:ERROR:mf_helpers.cc(14)] Error in dxva_video_decode_accelerator _win.cc on line 511 – Abraham Michael Jun 12 '19 at 12:52
  • @Umair i found a question on here similar to my error code at [here](https://stackoverflow.com/questions/52245604/devtools-listening-on-ws-127-0-0-157671-devtools-browser-8a586f7c-5f2c-4d10-8#_=_) am gonna try their solution and see where it leads me. – Abraham Michael Jun 12 '19 at 13:01

1 Answers1

0

I solved it. my mistake was that, I initiated the selenium process under the main spider class, instead of under the def parse function contained in the spider class. I just re-edited the code as follows.

import scrapy
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait as Webwait
from selenium.webdriver.support import expected_conditions as exco
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import random

C_options = Options()
C_options.add_argument("--disable-extensions")
C_options.add_argument("--disable-gpu")
C_options.add_argument("--headless")

class CloupAppSpider(scrapy.Spider):

    name = 'crawl'
    allowed_domains = ['scrapingclub.com']
    start_urls = ['https://scrapingclub.com/exercise/basic_login/']

    def __int__(self, name=None, passwd=None, *args, **kwargs):
        super(CloupAppSpider, self).__init__(*args, **kwargs)
        self.passwd = passwd
        self.name = name

    def parse(self, response):
        driver = webdriver.Chrome(options=C_options,                                  
         executable_path=r"C:\......\chromedriver.exe")
        driver.get("https://scrapingclub.com/exercise/basic_login/")
        cookie = driver.get_cookies()
        driver.add_cookie(cookie[0])
        pword = self.passwd
        uname = self.name
        Webwait(self.driver, 10).until(exco.presence_of_element_located((By.ID, "id_name")))
        Webwait(self.driver, 10).until(exco.presence_of_element_located((By.ID, "id_password")))
        CloupAppSpider.driver.find_element_by_id("id_name").send_keys(pword)
        CloupAppSpider.driver.find_element_by_id("id_password").send_keys(uname)
        CloupAppSpider.driver.find_element_by_css_selector(".btn.btn-primary").click()
        Webwait(self.driver, 10).until(exco.presence_of_element_located((By.CLASS_NAME, "col-lg-8")))
        html = CloupAppSpider.driver.execute_script("return document.documentElement.outerHTML")
        bs_obj = BeautifulSoup(html, "html.parser")
        text = bs_obj.find("div", {"class": "col-lg-8"}).find("p")
        obj = text.get_text()
        obj = obj + str(random.randint(0, 100))
        self.driver.close()
        yield{
            'text': obj
        }