6

This might be a subquestion of Passing arguments to process.crawl in Scrapy python but the author marked the answer (that doesn't answer the subquestion i'm asking myself) as a satisfying one.

Here's my problem : I cannot use scrapy crawl mySpider -a start_urls(myUrl) -o myData.json
Instead i want/need to use crawlerProcess.crawl(spider) I have already figured out several way to pass the arguments (and anyway it is answered in the question I linked) but i can't grasp how i am supposed to tell it to dump the data into myData.json... the -o myData.json part
Anyone got a suggestion ? Or am I just not understanding how it is supposed to work..?

Here is the code :

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

spider = challenges(start_urls=["http://www.myUrl.html"])
crawlerProcess.crawl(spider)
#For now i am just trying to get that bit of code to work but obviously it will become a loop later.

dispatcher.connect(handleSpiderIdle, signals.spider_idle)

log.start()
print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."
Community
  • 1
  • 1
Carele
  • 756
  • 3
  • 13

1 Answers1

8

You need to specify it on the settings:

process = CrawlerProcess({
    'FEED_URI': 'file:///tmp/export.json',
})

process.crawl(MySpider)
process.start()
eLRuLL
  • 18,488
  • 9
  • 73
  • 99
  • 1
    But you cannot use this for instance to pass several arguments for example `-o Sachin_urls.csv -t csv -L INFO --logfile Sachin.log`. This will work perfectly when using `scrapy crawl -a -o Sachin_urls.csv -t csv myspidername -L INFO --logfile Sachin.log. Any pointers? – hAcKnRoCk Feb 27 '17 at 16:15