1

When i call my spider through a python script which is as follows:

import os
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'project.settings')
from twisted.internet import reactor
from scrapy import log, signals
from scrapy.crawler import Crawler
from scrapy.settings import CrawlerSettings
from scrapy.xlib.pydispatch import dispatcher
from spiders.image import aqaqspider
def stop_reactor():
    reactor.stop()

dispatcher.connect(stop_reactor, signal=signals.spider_closed)
spider = aqaqspider(domain='aqaq.com')
crawler = Crawler(CrawlerSettings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
log.msg('Running reactor...')
reactor.run()  # the script will block here until the spider is closed
log.msg('Reactor stopped.')

My Json file is not being created. My pipelines.py is has following code:

import json
import codecs

class JsonWithEncodingPipeline(object):

    def __init__(self):
        self.file = codecs.open('scraped_data_utf8.json', 'w', encoding='utf-8')

    def process_item(self, item, spider):
        line = json.dumps(dict(item), ensure_ascii=False) + "\n"
        self.file.write(line)
        return item

    def spider_closed(self, spider):
        self.file.close()

When I call my spider with simple command line as scrapy crawl it is working fine i.e JSON file is being created.

Please Help me. I am new to scrapy???

Thank You all !! I have found the solution....

user2823667
  • 193
  • 2
  • 18
  • I had the same issue didn't get any help http://stackoverflow.com/questions/15483898/calling-scrapy-from-a-python-script-not-creating-json-output-file – Pdksock Sep 29 '13 at 18:55
  • Do you have any idea how to do it ?? – user2823667 Sep 30 '13 at 06:13
  • No. I guess there is an issue with scrapy itself. – Pdksock Sep 30 '13 at 06:17
  • I got it My python script is like this – user2823667 Sep 30 '13 at 07:02
  • from scrapy.crawler import CrawlerProcess from multiprocessing import Process from aqaq.spiders.image import aqaqspider def handleSpiderIdle(spider): reactor.stop() mySettings = {'LOG_ENABLED': True, 'ITEM_PIPELINES':[ 'aqaq.pipelines.JsonWithEncodingPipeline' ,'scrapy.contrib.pipeline.images.ImagesPipeline']} # global settings http://doc.scrapy.org/topics/settings.html settings.overrides.update(mySettings) crawlerProcess = CrawlerProcess(settings) crawlerProcess.install() crawlerProcess.configure() spider = nameofspider() # create a spider ourselves – user2823667 Sep 30 '13 at 07:03
  • @user2823667, if you found the answer yourself, consider posting it below – paul trmbrth Oct 01 '13 at 16:52

0 Answers0