8

I have a Scrapy project with multiple spiders along with multiple pipelines. Is there a way I can tell Spider A to use pipeline A, etc???

My pipelines.py has multiple pipeline classes each doing something different and I want to be able to tell a spider to use a specific pipeline.

I do not see any obvious ways looking at the available scrapy commands to do this...

xXPhenom22Xx
  • 1,265
  • 5
  • 29
  • 63
  • You read your answer with example : http://stackoverflow.com/questions/8372703/how-can-i-use-different-pipelines-for-different-spiders-in-a-single-scrapy-proje/38124696#38124696 – Nanhe Kumar Jun 30 '16 at 13:45

2 Answers2

14

It is possible to specify the pipeline to use in the custom_settings property of your spider class:

class BookSpider(BaseSpider):
    name = "book_spider"

    custom_settings = {
        'ITEM_PIPELINES': {
            'my_app.pipelines.BookPipeline': 300,
        }
    }

    def parse(self, response):
        return
sfenske
  • 153
  • 2
  • 8
  • What does the 300 number mean? – alexmulo Jul 18 '21 at 09:22
  • The 300 is the order/priority of the pipeline and decides the order in which the pipelines are invoked, similar to how the middleware works. It becomes useful when you have more pipelines for the same spider. – rosul Oct 08 '21 at 09:43
12

ITEM_PIPELINES setting is defined globally for all spiders in the project during the engine start. It cannot be changed per spider on the fly.

Here's what you can do. Define what spiders should be processed via the pipeline in the pipeline itself. Skip/continue processing items returned by spiders in the process_item method of your pipeline, e.g.:

def process_item(self, item, spider): 
    if spider.name not in ['spider1', 'spider2']: 
        return item  

    # process item

Also see:

Hope that helps.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thanks alot that helps, I wasnt sure if there was a more formal way of selecting pipelines on the fly by the Spider, but this definitely will do the trick. – xXPhenom22Xx Aug 03 '13 at 17:39
  • 3
    Yeah, also, you can make your custom setting with dictionary mapping of spider used per pipeline. E.g. `PIPELINE_SPIDERS={'name_of_the_pipeline': ['spider1', 'spider2'], ...}`. Then in your `process_item` method, you can check the setting and decide whether continue or not. – alecxe Aug 03 '13 at 17:41
  • @alecxe Something that I got stuck on for any future readers. Needed to do an `else: return item` at the end otherwise you will get errors that do not make too much sense. (at least I did) – Alex McLean Sep 03 '15 at 16:31