I am trying to deploy my scrapy-spider on my local host through scrapyd, The spider script contains the selenium module to perform some web automated tasks. the problem arises when I try to deploy it.
After running the scrapyd from command line to run the local host. I type the local host address on my browser and its online and listening. Then type in the the scrapyd-deploy command on another cmd window and its gets stuck like this for hours(no error msg).
$ scrapyd-deploy local
Packing version 1560251984
Deploying to project "crawler" in http://localhost:6800/addversion.json
I'm using gitbash on my windows machine by the way, I've also tried using normal cmd but it still the same endless wait and delay.
On the first cmd window where i run the scrapyd command to open the local host, i get something like this.
DevTools listening on ws://127.0.0.1:9137/devtools/browser/07f179fa-02ce-4b31-a5
96-9b700654f105
devtools listening stuck image
To my understanding that seems to be the selenium browser in headless mode trying to initiate but it keeps waiting endlessly.
When i open my project directory, i see new folders like eggs, project-egg info and build. It seems it eggifies the project but runs through a delay when trying to deploy and run it on local host.
this is my spider script
import scrapy
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait as Webwait
from selenium.webdriver.support import expected_conditions as exco
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import random
C_options = Options()
C_options.add_argument("--disable-extensions")
C_options.add_argument("--disable-gpu")
C_options.add_argument("--headless")
class CloupAppSpider(scrapy.Spider):
driver = webdriver.Chrome(options=C_options,
executable_path=r"C:\.....\chromedriver.exe")
driver.get("https://scrapingclub.com/exercise/basic_login/")
cookie = driver.get_cookies()
driver.add_cookie(cookie[0])
name = 'crawl'
allowed_domains = ['scrapingclub.com']
start_urls = ['https://scrapingclub.com/exercise/basic_login/']
def __int__(self, name=None, passwd=None, *args, **kwargs):
super(CloupAppSpider, self).__init__(*args, **kwargs)
self.passwd = passwd
self.name = name
def parse(self, response):
pword = self.passwd
uname = self.name
Webwait(self.driver, 10).until(exco.presence_of_element_located((By.ID, "id_name")))
Webwait(self.driver, 10).until(exco.presence_of_element_located((By.ID, "id_password")))
CloupAppSpider.driver.find_element_by_id("id_name").send_keys(pword)
CloupAppSpider.driver.find_element_by_id("id_password").send_keys(uname)
CloupAppSpider.driver.find_element_by_css_selector(".btn.btn-primary").click()
Webwait(self.driver, 10).until(exco.presence_of_element_located((By.CLASS_NAME, "col-lg-8")))
html = CloupAppSpider.driver.execute_script("return document.documentElement.outerHTML")
bs_obj = BeautifulSoup(html, "html.parser")
text = bs_obj.find("div", {"class": "col-lg-8"}).find("p")
obj = text.get_text()
obj = obj + str(random.randint(0, 100))
self.driver.close()
yield{
'text': obj
}
This is my scrapy.cfg content
[settings]
default = crawler.settings
[deploy:local]
url = http://localhost:6800/
project = crawler
Can someone help explain to me where i went wrong, I am clueless as i didn't get any error code when deploying. it just keeps waiting endlessly, my guess is when it's trying to process with selenium.