Power BI should we have thick or thin branches in star schema model

Question

This question has no satisfactory answer. Please feel encouraged to answer it or comment.

Let's consider the following data model. We have three dimensions in our model. If you need to name them, be it (A) product, (B) brand, (C) region. B is a container for A, so it is a hierarchy. Many products in one brand. The tables denoted as A, B, C, AB, ABC are bridge tables which contain only unique values.

Now the questions:

Is the AB bridge table necessary in the following model? Couldn't we connect A and B tables directly to ABC.
Is it a good idea to create Cartesian product for all the dimensions in the model as a Central Bridge Table?
Should we plug Budget table with AB dimensions to bridge AB or to bridge ABC? Depending on what is the answer to first question.
How should we plug Advertizing table to the model? To bridge ABC or to especially created bridge table BC and that one connect to ABC?

Now the schema:

   +-------+
   |       |
   |  A    +-----+
   |       |     |
   +-------+     |
                 |
                 v
   +-------+  +--+----+      +--------+      +------------+
   |       |  |       |      |        |      | Sales      |
   |  B    +-->  AB   +----->|  ABC   +----->|   ABC      |
   |       |  |       |      |        |      |            |
   +-------+  +--+----+      +---+----+      +------------+
                                 ^
                                 |
   +-------+                     |           +------------+
   |       |                     |           | Budget     |
   |  C    +---------------------+           |   AB       |
   |       |                                 |            |
   +-------+                                 +------------+


                                             +------------+
                                             | Advertizing|
                                             |   BC       |
                                             |            |
                                             +------------+

DAX bridging.

I love to construct bridge tables in DAX, not in M. There a few reasons for that. First, it is done with simple code. Second, it introduces sort of tidiness to Query Editor because I see there only source tables (not bridges).

Bridge tables - DAX or M?

So creating a Bridge for A dimension looks like this:

#A =
DISTINCT (
    UNION (
        TOPN ( 0, ROW ( "A", "Apple" ) ),
        DISTINCT ( Sales[A] ),
        DISTINCT ( Budget[A] ),
        DISTINCT ( Advertizing[A] )
    )
)

Bridge for AB would be a Cartesian product of so created A and B:

AB =
CROSSJOIN (
    DISTINCT ( '#A'[A] ),
    DISTINCT ( '#B'[B] ),
    "A@B", COMBINEVALUES("@",'#A'[A], '#B'[B])
)

Update after receiving the first answer.

I don't want to edit the content of my question once the bounty has started. After the first answer, I realized that hierarchies came to my question by accident and distract you from what I would like to find out. You can forget about hierarchies and treat dimensions A, B, C as independent dimensions.

I would like to focus on how to build a star schema in case we have many independent dimensions and some tables, say dictionaries with joined dimensions. For example, we can have a sales budget defined by region and brand, and an advertising budget defined by product_color. Should we build a central bridge table that has all dimensions (Cartesian product of ABC)? Or alternatively, should the central bridge table have thickbranches with many dimensions? In the second case we would have [AB] -> [ABC] <- [BC].

Kosuke Sakai · Accepted Answer · 2019-11-10T21:28:31.047

Updated in Nov 6, 2019 as per the comment from OP

Is the AB bridge table necessary in the following model? Couldn't we connect A and B tables directly to ABC.

No. There is no necessity of bridge tables like AB and ABC. With a model like this, where there are multiple fact tables, it is recommended to build your model with multiple star schema. Simply make direct one-to-many relationships between dimension tables and fact tables, such as A -> Sales, B -> Sales, A -> Budget, and B -> Budget. Notice when you look at each single fact table, the fact table and all related dimension tables form a star schema.

Is it a good idea to create Cartesian product for all the dimensions in the model as a Central Bridge Table?

No. It is just redundant to Cartesian product all dimension tables into one big dimension table (let's refer to as "joint dimension table").

Bridge table between two dimensions will be needed typically when there is many-to-many relationship between the dimensions. For example, when Customer may belong to multiple Category, a bridge table Customer Category would be needed. The scenario presented by OP is not a use case of the bridge table.

Disadvantages of joint dimension table are,

It requires extra data storage. If A, B, C have 100, 100, 1000 rows respectively, the joint dimension table will have 10 million rows. Suppose if you turned afterwards to add a new dimension with 100 rows, the number of dimension rows will be 1 billion! This is not economical.
It requires extra computation. When we want Sales filtered by A, the engine first needs to scan over joint dimension table to extract rows that match the filtered value of A, which would potentially be a huge number of rows, then the engine scans Sales fact table with the relationship key that is contained in the extracted joint dimension table rows. This might work only when the size of dimensions is very small and the fact table is very large. But in many cases, the performance will be desperate.
It is an irrelevant representation of business data. I think this is the biggest disadvantage. In your model, Budget is only defined in the granularity of dimensions A and B. It is nonsense to think of a Budget that belongs to an instance of C. However, in order to make relationship between joint dimension table and Budget, we need to tweak Budget to make it related to a specific instance of C.

Should we plug Budget table with AB dimensions to bridge AB or to bridge ABC? Depending on what is the answer to first question.

Budget should be related directly to A and B. Because the granularity of Budget is each A and B in your model.

How should we plug Advertizing table to the model? To bridge ABC or to especially created bridge table BC and that one connect to ABC?

Make relationships B -> Advertising and C -> Advertising.

BTW, there is actually no many-to-many relationship in your model. There might be multiple Sales records related to a Product, but each Sales record has only one Product, so the relationship between Product and Sales is one-to-many. Same applies to the other relationships in the model.

It would be better described as "multiple fact tables with different granularity".

Added in Nov 6, 2019 as per the comment from OP

It looks like OP is confused about how to handle multiple fact tables with different granularity. I suggest OP will benefit from this article by Marco Russo, but I try to summarize the essence here.

Basically, the model OP presents can be simplified into a star schema model, where the fact tables Sales, Budget, and Advertising will be placed in the center of different stars.

Problem resides in that some dimension tables are shared by different fact tables, while some dimensions are not shared. For example, dimensions A and B are shared by Sales and Budget, whereas C is only relevant with Sales. Let's think we are comparing Sales and Budget. When we drill down the report by C, what value should appear in Budget? The answer may vary depending on the business, but here let's think we expect Budget to be blank, because we have no Budget defined at the level of each C.

Generally accepted approach for this kind of scenario is, to check the filter context in the measure and return value only when it is filtered by relevant dimensions. For example, calculate the total Budget only when current context has no filter on C.

[Total Budget] :=
IF (
    NOT ( ISFILTERED ( 'C' ) ),
    SUM ( 'Budget'[Amount] )
)

References

Added in Nov 11, 2019

Analyzing Data with Power BI and Power Pivot for Excel Covers patterns and best practices of data modeling in detail.

Understand star schema and the importance for Power BI Illustrates the features and benefits of star schema. Also, it lists up other common modeling patterns.

Best Way for work with Multiple Fact Tables A Q&A thread in Microsoft Power BI Community Forum, where it is mentioned that link table is not a best practice to handle multiple fact tables.

There was a type error in my question no 1. Should be ABC (not AB). Please review your answer for that. — Przemyslaw Remin, Nov 05 '19 at 09:49
Please forget hierarchies for a moment. Imagine that A, B, C are independent dimensions. If you say NO to the first question, then the alternative is to have a central bridge table with big branches containing multiple dimensions, isn't it? In that case we would have [AB]->[ABC]<-[BC]. But in the second case how to connect [B] dimension? You would have to join it to both sides: [AB]<-[B]->[BC]. In terms of complexity, after adding [AC] to the central star [ABC] and then adding single dimensions [A] and [C], the net of relationships is going to look like a neural network. — Przemyslaw Remin, Nov 05 '19 at 10:34
While I am still digesting the last part of your answer, let me ask you. This question is the root of starting this thread. How would you filter Sales table if you want to see only the brands that have Budget in the Budget table (some brands do have budget while some do not). I needed a calculated column HasBudget (Yes/No) to place it on the slicer. That is why I thought of Cartesian of ABC. But now I think I am guessing your concept. You just create appropriate measures that can read slicer of A, B, C. In that case it would be even possible not to create any physical relationships. — Przemyslaw Remin, Nov 06 '19 at 09:58
Add `Total Budget` measure in **Filters** pane of your visual, and select "is not blank". — Kosuke Sakai, Nov 06 '19 at 16:57
No, the use of that option is not possible. The report should be for the standard of C level managers. Everything must be intuitive, self explanatory, obvious. Simple as slicers are. — Przemyslaw Remin, Nov 06 '19 at 17:26
OK, in that case, how about adding calculated column in Budget dimension table something like `HasBudget = IF( CALCULATE( COUNTROWS( Budget ) ) > 0, "Yes", "No" )`. Then you can create a slicer with it. — Kosuke Sakai, Nov 06 '19 at 17:51
Single dimensions are not enough to filer Sales table by all what is being on Promotion. There is simply a need to have combinations of dimensions as AB. There are a couple of ways to solve it but I am looking for best practice solution. — Przemyslaw Remin, Nov 10 '19 at 10:58
I'm speculating you got that idea because Product and Brand have hierarchical relationship. You commented to assume no hierarchy between dimensions, but it is more relevant to consider the relationship between dimensions in real world. You may denormalize `Product` and `Brand` in one table, where `Product` table has columns of `Brand` attributes in it. This is different from what you describe as `AB`, because the granularity is each `Product`, not the all combinations of `Product` and `Brand`. You will be able to filter `Sales` to only `Product` which has `Budget`. — Kosuke Sakai, Nov 10 '19 at 21:07
I do not mark your answer as accepted but I grant you the bounty. Thank you for all the contribution. — Przemyslaw Remin, Nov 12 '19 at 09:33

Power BI should we have thick or thin branches in star schema model

1 Answers1

References