1

I'm starting to learn about data warehouses and have run across a scenario I can't wrap my head around. I was attempting to sketch out a data warehouse that would track monthly sales for different stores. If the source OLTP database for stores has a many-to-many relationship between store's attributes and store, how would I represent the store dimension in the data warehouse?

Dimension's Source OLTP Database:

enter image description here

Data Warehouse ERD - How would the many-to-many attribute relationhip be represented:

enter image description here

Christophe
  • 68,716
  • 7
  • 72
  • 138
Ben M
  • 11
  • 2
  • Hi - can you explain what those attributes are? Just having attribute names with no values looks a bit odd. Roughly how many attributes does each store have? Is there a "pool" of attributes some of can be assigned to a store or are these ad hoc attributes that can be created/assigned randomly? – NickW Jun 23 '22 at 10:35
  • Apologies for the vagueness. I'll try to calrify, suppose the attribute is store_type and possible values are drive through, table service, take out, counter service. Any store could be just one store_type ex) a store that only has drive through, or a store could be multiple store_types ex) a store that has a drive through and counter service. – Ben M Jun 28 '22 at 18:32

1 Answers1

0

If your database uses a star-schema build around the fact table, it could be tempting as a first guess to bring the attributes into the fact table:

  • This would make sense if the attributes would be unique in relation to the elementary metrics, i.e. if each unit sold could be unambiguously assigned to one value of that attribute. This would make sense if there would be one attribute for each store, or for each product, since the summation of metrics in the fact table would stay consistent.
  • This does not make sense if each unit sold could be assigned to multiple values of the same attribute at the same time. In fact the attributes here remain derived from the store and there is no way to bring this in the facts without either losing some relationship or duplicating units in the metrics. So you'd end-up keeping them related to the store dimension. As a consequence, you cannot break down the metrics according to these attributes, you could only filter according to these attributes.

The latter approach does not allow to really handle the attributes in the same way as dimensions. A slightly different approach could be to go for a dimension attribute_combination:

  • You would then create a unique id for each unique combination of attributes that you encounter in a store. In this case, you could derive from the store dimension this attribute_combination and use it as a dimension.
Christophe
  • 68,716
  • 7
  • 72
  • 138