6

ENVIRONMENT: Windows7, Python 3.6.5, Scrapy 1.5.1

PROBLEM DESCRIPTION:

I have a scrapy project called project_github, which contains 3 spiders:spider1, spider2, spider3. Each of these spiders scrapes data from a particular website individual to that spider.

I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json, so that from the command line I can:

Execute the script scrapy crawl spider1 which returns spider1_181115.json

Currently I am using ITEM EXPORTERS in settings.py with the following code:

import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'

Obviously this code always writes spider1_TodaysDate.json regardless of the spider used... Any suggestions?

johnnydoe
  • 308
  • 2
  • 10

1 Answers1

11

The way to do this is by defining custom_settings as a class attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.

So, for spider1:

class spider1(scrapy.Spider):
    name = "spider1"
    allowed_domains = []

    custom_settings = {
        'FEED_URI': 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
        'FEED_FORMAT': 'json',
        'FEED_EXPORTERS': {
            'json': 'scrapy.exporters.JsonItemExporter',
        },
        'FEED_EXPORT_ENCODING': 'utf-8',
    }
Jakub Kukul
  • 12,032
  • 3
  • 54
  • 53
johnnydoe
  • 308
  • 2
  • 10
  • just today I had an issue with printing date and used the way you do until I figured out the shorter way via `str(datetime.date.today())` – zhanymkanov Nov 23 '19 at 17:13