I'm working with scrapy. I have a spider that starts with:
class For_Spider(Spider):
name = "for"
table = 'hello' # creating dummy attribute. will be overwritten
def start_requests(self):
self.table = self.dc # dc is passed in
I have the following pipeline :
class DynamicSQLlitePipeline(object):
@classmethod
def from_crawler(cls, crawler):
# Here, you get whatever value was passed through the "table" parameter
table = getattr(crawler.spider, "table")
return cls(table)
def __init__(self,table):
try:
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
db = dataset.connect(db_path)
table_name = table[0:3] # FIRST 3 LETTERS
self.my_table = db[table_name]
When I start the spider with:
scrapy crawl for -a dc=input_string -a records=1
After stepping through the execution repeatly and with help from questions like What is the relationship between the crawler object with spider and pipeline objects? , It appears that the order of execution is :
1) For_spider
2) DynamicSQLlitePipeline
3) start_requests
The parameter in the spider "table" is passed to the DynamicSQLlitePipeline object by the from_crawler method which has access to different components of the scrapy system. Table is the initialized as "hello" (a dummy variable) that I set. after 1 and 2 above execution returns to the spider and the start_requests begins. The command line parameters only become available inside start_requests, so its too late to set the table name dynamically as the pipeline has already been instantiated.
Therefore I don't know if there is a way to set the pipeline table name dynamically. How can I do this.
edit:
elRuLL is correct, and his solution works. I looked through the spider object in step 1 and did not find any parameters listed in the spider. Am I missing them?
>>> Spider.__dict__
mappingproxy({'__module__': 'scrapy.spiders', '__doc__': 'Base class for scrapy spiders. All spiders must inherit from this\n class.\n ', 'name': None, 'custom_settings': None, '__init__': <function Spider.__init__ at 0x00000000047A6D90>, 'logger': <property object at 0x0000000003E0E598>, 'log': <function Spider.log at 0x00000000047A6EA0>, 'from_crawler': <classmethod object at 0x0000000003B28278>, 'set_crawler': <function Spider.set_crawler at 0x00000000047C9048>, '_set_crawler': <function Spider._set_crawler at 0x00000000047C90D0>, 'start_requests': <function Spider.start_requests at 0x00000000047C9158>, 'make_requests_from_url': <function Spider.make_requests_from_url at 0x00000000047C91E0>, 'parse': <function Spider.parse at 0x00000000047C9268>, 'update_settings': <classmethod object at 0x0000000003912C88>, 'handles_request': <classmethod object at 0x0000000003E0B7F0>, 'close': <staticmethod object at 0x0000000004756BA8>, '__str__': <function Spider.__str__ at 0x00000000047C9488>, '__repr__': <function Spider.__str__ at 0x00000000047C9488>, '__dict__': <attribute '__dict__' of 'Spider' objects>, '__weakref__': <attribute '__weakref__' of 'Spider' objects>})