I've written a few spiders that pull similar data from different sources. I've also written a pipeline that allows this data to be put in a database. I want to be able to use the same code for multiple spiders to output to different tables, named dynamically from the spider name.
Here is the pipeline.py
code:
class DbPipeline(object):
def __init__(self):
"""
Initialises database connection and sessionmaker.
Creates table if it doesn't exist.
"""
engine = db_connect()
create_output_table(engine)
self.Session = sessionmaker(bind=engine)
def process_item(self, item, spider):
"""
Saves scraped products in database
"""
exists = self.check_item_exists(item)
if not exists:
session = self.Session()
product = Products(**item)
try:
session.add(product)
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return item
def check_item_exists(self,item):
session = self.Session()
product = Products(**item)
result = session.query(Products).filter(Products.title == item['title']).first()
return result is not None
And here is the model.py
file:
DeclarativeBase = declarative_base()
def create_output_table(engine):
DeclarativeBase.metadata.create_all(engine)
def db_connect():
"""
Connects to database from settings defined in settings.py
Returns an sqlalchemy engine instance
"""
return create_engine(URL(**settings.DATABASE))
class Products(DeclarativeBase):
"""Sqlalchemy table model"""
__tablename__ = "name"
id = Column(Integer, primary_key=True)
title = Column('title', String(200))
price = Column('price', String(10), nullable=True)
url = Column('url', String(200), nullable=True)
What i'm trying to do is get the __tablename__
variable to be the same as the spider name, which I can easily do in the process_item
function as it is passed a spider
object and can use spider.name
and assign it to a class variable, however the function will run after the table is created/defined. How can I go about getting the spider name outside of the process_item
function in the pipelines.py
file?
Edit: I've tried the solutions listed in How to access scrapy settings from item Pipeline however access to the 'settings' doesn't give me access to the attributes assigned to the current spider running. I need to dynamically get the name of the spider based on what spider is running the pipelines. Thanks