2

Scrapy 1.4

I am using this script (Run multiple scrapy spiders at once using scrapyd) to schedule multiple spiders at Scrapyd. Before I was using Scrapy 0.19 and was running fine.

I am receiving the error: TypeError: create_crawler() takes exactly 2 arguments (1 given)

So now I dont know if the problem is in Scrapy version or a simple python logical problem (I am new with python)

I did some modifications to check before if the spider is active on the database.

class AllCrawlCommand(ScrapyCommand):

    requires_project = True
    default_settings = {'LOG_ENABLED': False}

    def short_desc(self):
        return "Schedule a run for all available spiders"

    def run(self, args, opts):

        cursor = get_db_connection()
        cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'")
        rows = cursor.fetchall()

        # Coloco todos os dominios dos sites em uma lista
        # La embaixo faco uma verificacao para rodar somente os
        # que estao disponiveis e somente os que batem o dominio do site
        sites = []
        for row in rows:
            site = row[2]
            print site

            # adiciono cada site na lista 
            sites.append(site)

        url = 'http://localhost:6800/schedule.json'
        crawler = self.crawler_process.create_crawler()
        crawler.spiders.list()
        for s in crawler.spiders.list():
            #print s
            if s in sites:

                values = {'project' : 'esportifique', 'spider' : s}
                r = requests.post(url, data=values)
                print(r.text)
Ailton
  • 165
  • 1
  • 12
  • YOu should pass crawler on param https://doc.scrapy.org/en/latest/topics/api.html#scrapy.crawler.CrawlerProcess.crawlers – parik Jan 25 '18 at 14:20
  • Can you give me a example please? – Ailton Jan 25 '18 at 17:16
  • you can find the examples here https://stackoverflow.com/questions/21345092/running-multiple-scrapy-spiders-the-easy-way-python – parik Jan 25 '18 at 17:27
  • Interesting! That will be very helpful! Thank you! – Ailton Jan 26 '18 at 15:51
  • Let me please know if it solved your problem – parik Jan 26 '18 at 15:54
  • I am testing the easy way solution proposed by Yuda and works fine! But the process run all spiders together, mixing scrapings from the different domains. Actually I would like to run under scrapyd because this way I can see each spider running, see the logs, etc. – Ailton Jan 27 '18 at 11:59

1 Answers1

1

Based on parik suggested link, here's what I did:

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
import requests

setting = get_project_settings()
process = CrawlerProcess(setting)

url = 'http://localhost:6800/schedule.json'

cursor = get_db_connection()
cursor.execute("SELECT * FROM lojas WHERE disponivel = 'S'")
rows = cursor.fetchall()

# Coloco todos os dominios dos sites em uma lista
# La embaixo faco uma verificacao para rodar somente os
# que estao disponiveis e somente os que batem o dominio do site
sites = []
for row in rows:
    site = row[2]
    print site

    # adiciono cada site na lista 
    sites.append(site)

for spider_name in process.spiders.list():
    print ("Running spider %s" % (spider_name))
    #process.crawl(spider_name,query="dvh") #query dvh is custom argument used in your scrapy
    if spider_name in sites:
        values = {'project' : 'esportifique', 'spider' : spider_name}
        r = requests.post(url, data=values)
Ailton
  • 165
  • 1
  • 12