I would like to run my scrapy sprider from python script. I can call my spider with the following code,
subprocess.check_output(['scrapy crawl mySpider'])
Untill all is well. But before that, I instantiate the class of my spider by initializing the start_urls, then the call to scrapy crawl doesn't work since it doesn't find the variable start_urls.
from flask import Flask, jsonify, request
import scrapy
import subprocess
class ClassSpider(scrapy.Spider):
name = 'mySpider'
#start_urls = []
#pages = 0
news = []
def __init__(self, url, nbrPage):
self.pages = nbrPage
self.start_urls = url
def parse(self):
...
def run(self):
subprocess.check_output(['scrapy crawl mySpider'])
return self.news
app = Flask(__name__)
data = []
@app.route('/', methods=['POST'])
def getNews():
mySpiderClass = ClassSpider(request.json['url'], 2)
data.append(mySpider.run())
return jsonify({'data': data})
if __name__ == "__main__":
app.run(debug=True)
The error I get is: TypeError: init missing 1 required positional argument: 'start_url' and 'pages'
any help please?