-1

I'm using scrapy and I have the following functioning pipeline class :

class DynamicSQLlitePipeline(object):

@classmethod
def from_crawler(cls, crawler):
    # Here, you get whatever value was passed through the "table" parameter
    docket = getattr(crawler.spider, "docket")
    return cls(docket)

def __init__(self,docket):
    try:
        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        db = dataset.connect(db_path)
        table_name = docket[0:3]  # FIRST 3 LETTERS
        self.my_table = db[table_name]


    except Exception:
        # traceback.exec_print()
        pass

def process_item(self, item, spider):

    try:
        test = dict(item)
        self.my_table.insert(test)
        print('INSERTED')
    except IntegrityError:
            print('THIS IS A DUP')

In my spider I have:

custom_settings = {
    'ITEM_PIPELINES': {

        'myproject.pipelines.DynamicSQLlitePipeline': 600,
    }
}

From a recent question I was pointed to What is the 'cls' variable used for in Python classes?

If I understand correctly in order for the pipeline object to be instantiated (using the init function), it requires a docket number. The docket number only becomes available once the from_crawler class method is run. But what triggers the from_crawler method. Again the code is working.

user1592380
  • 34,265
  • 92
  • 284
  • 515
  • `new_pipeline = DynamicSQLlitePipeline.from_crawler(crawler)` – Stephen Rauch Apr 02 '18 at 01:01
  • Some other code that you haven't shown us is calling it by doing something like `DynamicSQLlitePipeline.from_crawler(crawler)`. Or, maybe, you're passing the name `DynamicSQLlitePipeline` into the crawler, it's storing it as `pipeline_type`, and later calling `pipeline_type.from_crawler(crawler)`. – abarnert Apr 02 '18 at 01:01
  • @abarnert added the entire pipeline class. – user1592380 Apr 02 '18 at 01:06
  • It's not in the pipeline class. The actual calling code is inside Scrapy, but you need some code to tell it which classes to construct and what order to connect them up in, and that's the code you haven't shown us. I've written an answer that tries to explain what's going on in general terms, but it would be a lot easier for you to understand if you gave us a [mcve]. – abarnert Apr 02 '18 at 01:08

1 Answers1

2

The caller of a classmethod has to have an instance of the class. They may just access it by name, like this:

DynamicSQLlitePipeline.from_crawler(crawler)

… or:

sqlitepipeline.DynamicSQLlitePipeline.from_crawler(crawler)

Or maybe you pass the class object to someone, and they store it and use it later like this:

pipelines[i].from_crawler(crawler)

In Scrapy, the usual way to register a set of pipelines with the framework, according to the docs, is like this:

ITEM_PIPELINES = {
    'myproject.pipelines.PricePipeline': 300,
    'myproject.pipelines.JsonWriterPipeline': 800,
}

(Also see the Extensions user guide, which explains how this fits into a scrapy project.)

Presumably you've done something similar in code you haven't shown us, putting something like 'sqlscraper.pipelines.DynamicSQLlitePipeline' in that dict. At some point, Scrapy goes through that dict, sorts it in order by the values, and instantiates each pipeline. (Because it has the name of the class, as a string, instead of the class object, this is a little trickier, but the details really aren't relevant here.)

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • @user61629 Cool. I was sure there was a place that explains this rather than just giving a reference, but I wasn't sure where it was. So I've edited your find into the answer. Thanks for finding it. – abarnert Apr 02 '18 at 01:28
  • Thank you, I also found the following helpful https://doc.scrapy.org/en/latest/topics/extensions.html#writing-your-own-extension. I don't 100% understand by the scrapy api entry point, but I'm assuming the from_crawler class method is called by scrapy prior to instantiating each exstension (including pipeline) – user1592380 Apr 02 '18 at 01:30
  • @user61629 Most of the guts of scrapy aren't that complicated, so if you want to find out more of what it does, you can always read [the source](https://github.com/scrapy/scrapy). If you grep for `ITEM_PIPELINES` it should be easy to find. – abarnert Apr 02 '18 at 01:32