I have a star-schema architectured database that I want to represent in SQLAlchemy. Now I have the problem on how this can be done in the best possible way. Right now I have a lot of properties with custom join conditions, because the data is stored in different tables. It would be nice if it would be possible to re-use the dimensions for different fact tablesw but I haven't figured out how that can be done nicely.
1 Answers
A typical fact table in a star schema contains foreign key references to all dimension tables, so usually there wouldn't be any need for custom join conditions - they are determined automatically from foreign key references.
For example a star schema with two fact tables would look like:
Base = declarative_meta()
class Store(Base):
__tablename__ = 'store'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
class Product(Base):
__tablename__ = 'product'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
class FactOne(Base):
__tablename__ = 'sales_fact_one'
store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
units_sold = Column('units_sold', Integer, nullable=False)
store = relation(Store)
product = relation(Product)
class FactTwo(Base):
__tablename__ = 'sales_fact_two'
store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
units_sold = Column('units_sold', Integer, nullable=False)
store = relation(Store)
product = relation(Product)
But suppose you want to reduce the boilerplate in any case. I'd create generators local to the dimension classes which configure themselves on a fact table:
class Store(Base):
__tablename__ = 'store'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
@classmethod
def add_dimension(cls, target):
target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
target.store = relation(cls)
in which case usage would be like:
class FactOne(Base):
...
Store.add_dimension(FactOne)
But, there's a problem with that. Assuming the dimension columns you're adding are primary key columns, the mapper configuration is going to fail since a class needs to have its primary keys set up before the mapping is set up. So assuming we're using declarative (which you'll see below has a nice effect), to make this approach work we'd have to use the instrument_declarative()
function instead of the standard metaclass:
meta = MetaData()
registry = {}
def register_cls(*cls):
for c in cls:
instrument_declarative(c, registry, meta)
So then we'd do something along the lines of:
class Store(object):
# ...
class FactOne(object):
__tablename__ = 'sales_fact_one'
Store.add_dimension(FactOne)
register_cls(Store, FactOne)
If you actually have a good reason for custom join conditions, as long as there's some pattern to how those conditions are created, you can generate that with your add_dimension()
:
class Store(object):
...
@classmethod
def add_dimension(cls, target):
target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
target.store = relation(cls, primaryjoin=target.store_id==cls.id)
But the final cool thing if you're on 2.6, is to turn add_dimension
into a class decorator. Here's an example with everything cleaned up:
from sqlalchemy import *
from sqlalchemy.ext.declarative import instrument_declarative
from sqlalchemy.orm import *
class BaseMeta(type):
classes = set()
def __init__(cls, classname, bases, dict_):
klass = type.__init__(cls, classname, bases, dict_)
if 'metadata' not in dict_:
BaseMeta.classes.add(cls)
return klass
class Base(object):
__metaclass__ = BaseMeta
metadata = MetaData()
def __init__(self, **kw):
for k in kw:
setattr(self, k, kw[k])
@classmethod
def configure(cls, *klasses):
registry = {}
for c in BaseMeta.classes:
instrument_declarative(c, registry, cls.metadata)
class Store(Base):
__tablename__ = 'store'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
@classmethod
def dimension(cls, target):
target.store_id = Column('store_id', Integer, ForeignKey('store.id'), primary_key=True)
target.store = relation(cls)
return target
class Product(Base):
__tablename__ = 'product'
id = Column('id', Integer, primary_key=True)
name = Column('name', String(50), nullable=False)
@classmethod
def dimension(cls, target):
target.product_id = Column('product_id', Integer, ForeignKey('product.id'), primary_key=True)
target.product = relation(cls)
return target
@Store.dimension
@Product.dimension
class FactOne(Base):
__tablename__ = 'sales_fact_one'
units_sold = Column('units_sold', Integer, nullable=False)
@Store.dimension
@Product.dimension
class FactTwo(Base):
__tablename__ = 'sales_fact_two'
units_sold = Column('units_sold', Integer, nullable=False)
Base.configure()
if __name__ == '__main__':
engine = create_engine('sqlite://', echo=True)
Base.metadata.create_all(engine)
sess = sessionmaker(engine)()
sess.add(FactOne(store=Store(name='s1'), product=Product(name='p1'), units_sold=27))
sess.commit()

- 72,307
- 23
- 193
- 185
-
Very nice design - beautiful! – Philipp der Rautenberg Jun 05 '12 at 12:36
-
Inspired by this, I finally figured out how to pass configuration to declared_attr, so that libraries can be made aware of host application models: https://gist.github.com/miohtama/844cc78bcf1d317e31ca – Mikko Ohtamaa Jun 07 '15 at 04:07
-
This looks very elegant, but it's now 10 years later, I'm using python 3.6, and it doesn't work for me. I changed the init function of BaseMeta to have `super().__init__(classname, bases, dict_)` to make it compatible with python 3.6, but I'm obviously missing something because I get the error: sqlalchemy.exc.InvalidRequestError: Don't know how to join from scenario; please use select_from() to establish the left entity/selectable of this join – Cameron Oliver Oct 30 '19 at 10:59