I'm using scrapy and I have the following functioning pipeline class :
class DynamicSQLlitePipeline(object):
@classmethod
def from_crawler(cls, crawler):
# Here, you get whatever value was passed through the "table" parameter
docket = getattr(crawler.spider, "docket")
return cls(docket)
def __init__(self,docket):
try:
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
db = dataset.connect(db_path)
table_name = docket[0:3] # FIRST 3 LETTERS
self.my_table = db[table_name]
except Exception:
# traceback.exec_print()
pass
def process_item(self, item, spider):
try:
test = dict(item)
self.my_table.insert(test)
print('INSERTED')
except IntegrityError:
print('THIS IS A DUP')
In my spider I have:
custom_settings = {
'ITEM_PIPELINES': {
'myproject.pipelines.DynamicSQLlitePipeline': 600,
}
}
From a recent question I was pointed to What is the 'cls' variable used for in Python classes?
If I understand correctly in order for the pipeline object to be instantiated (using the init function), it requires a docket number. The docket number only becomes available once the from_crawler class method is run. But what triggers the from_crawler method. Again the code is working.