1

This question is not limited to Power BI, but it will help me explain my problem.

If you have more than one table in Power BI, you can establish a relationship between them by dragging a column from one table to the other like this:

enter image description here

And you can edit that relationship by clicking the occuring line:

enter image description here

And by the way, here are the structures of the two tables:

# Table1
A,B
1,abc
2,def
3,ghi
4,jkl

# Table2
A,C
1,abc
1,def
2,ghi
3,ghit

This works fine since column A in Table1 consists of unique values and can work as a primary key. And now you can head over to the Report tab, set up two tables, and slice and dice at your hearts desire either by clicking directly under A in Table1, or by introducing a slicer:

enter image description here

But the thing is that you can do that without having established a relationship between the tables. Delete the relationshiop under Relationships and go back to Report and select Home > Manage Relationships to see what I mean:

enter image description here

As the dialog box says 'There are no relationships defined yet.' But you can still subset one table by making selections in the other just like before (EDIT: This statement has been proven wrong in the answer from RADO) . I do know that you can highlight the slicer and select Format > Edit Interactions and deselect the tables associated with the slicer. But I'm still puzzled by the whole thing.

So is there something happening under the hood here that I'm not aware of? Or is the relationship between tables really defined by the very contents of the tables - in the sence that the existence of related values accross tables with the existence of a potential primary key (be it natural or synthetic) makes it possible to query them using SQL, dplyr verbs or any other form of querying techniques. And that you really do not need an explicitly defined relationship?

Or put in another way, does the establishment of a Power BI table relationship have a SQL equivalent? Perhaps like the following:

CREATE TABLE Persons (
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    PRIMARY KEY (ID)
);

I'm sorry If I'm rambling a bit here, but I'm just very confused. And googling has so far only added to the confusion. So thank you for any insights!

vestland
  • 55,229
  • 37
  • 187
  • 305
  • 2
    It is defining a foreign key relationship. I don't know if PowerBI actually changes the underlying data model, though. – Gordon Linoff Oct 17 '18 at 18:20
  • @GordonLinoff thank you for your feedback! It's the details around the `defining a foreign key relationship` that has me confused here. Do you really need to define such a relationship at all in order to execute any type of query between the tables? Like a left join for example? Isn't the very existence of the relational data enough in itself? – vestland Oct 17 '18 at 18:26
  • 1
    No, I don't think that "the relationship between tables really defined by the very contents of the tables". The relationship needs to be established as you did first. I don't know PowerBi, but it may be even able to run a query "without relationship", what is called a "cross join". These queries come sometimes in handy (though not very often to me), and they don't require a foreign key. – The Impaler Oct 17 '18 at 18:26
  • 3
    You don't need to specify that `table1.col1` is a foreign key referencing `table2.col1` to join the two tables on those columns, no. Foreign keys let the database enforce consistency. You can't insert a new row in `table1` with a `col1` value that doesn't also exist in a row of `table2`, for example. – Shawn Oct 17 '18 at 18:30
  • Good question, by the way (not a silly one). Maybe someone with experience with PowerBi may give you a better answer. Upvoting. – The Impaler Oct 17 '18 at 18:30
  • @TheImpaler, thank you! This is just something that has bothered me for a long time, and that I guess I would have understood right away had I only paid attention in 'Databases101' instead of falling asleep. I find your comment both enlightening and a bit confusing at the same time. But now I have a sense of where to look further with regards to which queries will or will not need a primary-foreign key relationship. – vestland Oct 17 '18 at 18:37
  • Hi. Please [use text, not images/links, for text (including code, tables & ERDs)](https://meta.stackoverflow.com/q/285551/3404097). Use a link/image only for convenience to supplement text and/or for what cannot be given in text. And never give a diagram without a legend/key (Here of course some of the diagrams are useful in discussing Power BI.) – philipxy Oct 17 '18 at 23:15

3 Answers3

1

Your statement "But you can still subset one table by making selections in the other just like before" is not correct. This is a key issue here.

Relations enable propagation of filter context in Power BI. That's a very loaded phrase, and you will have to learn what it means if you plan to use Power BI. It's the most important concept to understand.

To see what I mean, you will need to write DAX measures and try to manipulate them using your tables. You will immediately see the difference when you have or don't have relations.

How the whole system works (simplified): PowerBI contains a language called "DAX". You will create measures in DAX, and PowerBI will then translate them into its internal language called xmSQL, which is a special flavor of SQL. In xmSQL, regular connection is translated into LEFT OUTER JOIN, like this:

SELECT SUM(Sales.Amount)
FROM Sales
LEFT OUTER JOIN Customer
ON Sales.Customer_Key = Customer.Customer_Key

By-directional relations are a bit more complex, but conceptually similar.

Overall, when you create relations between tables, you are telling PowerBI engine how to join the tables. The engine then also adds some optimizations to speed up the queries. Every time you execute a DAX measure, click a slicer or a visual, PowerBI generates multiple xmSQL statements in the background, executes them, and then renders their results as visuals. You can see these SQL queries with some tools such as DAX Studio.

Note that it's not strictly necessary to establish relations between tables in PowerBI. You can imitate the same behavior using DAX (programmatically), but such "virtual" relations are more complex and can be substantially slower.

RADO
  • 7,733
  • 3
  • 19
  • 33
  • thank you for taking the time to go into the details here! – vestland Oct 17 '18 at 21:01
  • I may be missing your point here, but I really think `"But you can still subset one table by making selections in the other just like before" ` **is** correct. I'll add details as an edit in the question. – vestland Oct 18 '18 at 04:25
  • I am pretty sure it's not. It's not physically possible. – RADO Oct 18 '18 at 04:30
  • 1
    BTW, you are getting a lot of incorrect advise in your question comments. They are all great for relational models/databases, but have no relevance to PowerBI/dimensional model. – RADO Oct 18 '18 at 04:34
  • I just checked my `But you can still..` statement, and can hereby confirm that you are 100% correct! But you *can* subset both tables using the slicer, but I guess that's another thing totally. Thank you for clearing up a pretty brutal misunderstanding! – vestland Oct 18 '18 at 04:37
  • 1
    No, slicer won't work either... if you look closer, you will see that the results are not filtered correctly. – RADO Oct 18 '18 at 04:38
  • 1
    Do a quick test: for table 2, create a measure: Total Count = COUNT(Table2[C]). Now drop Column [A] from table1 on pivot rows, and the measure into values. If you have proper relations, you will see count disected by Column A: 2, 1, 1 . Without relations, you will see the same number (4) for all rows. It means that filtering is not happening. If you create a slicer from table1, it will filter table rows, but the number will remain the same (4). – RADO Oct 18 '18 at 04:47
  • You are 100% right again. Only Table1 is being subset using the slicer. Which makes sense even to me. Thanks again! – vestland Oct 18 '18 at 04:48
  • 1
    Part of a problem is that you tagged your question with "SQL" and "relational database" tags, and they draw experts on SQL and databases. What you need instead is experts on dimensional modeling, star schema, power Bi, DAX, power pivot, etc. These are very, very different things. – RADO Oct 18 '18 at 04:51
  • yes I guess my tagging was a bit off. But If I had tagged the question more correctly, I would not have gotten these exact clarifications from you, and I think they turned out to be very valuable! And just to avoid any further misunderstandings from my part, is your statement that `In xmSQL, regular connection is translated into LEFT OUTER JOIN` the direct answer to the question `What is happening under the hood when a relationship is established between tables`? – vestland Oct 18 '18 at 04:56
  • 1
    Yes. When you setup a connection, it tells PowerBi engine what xmSQL query to generate for your DAX measures. With a regular connection, it will generate left outer joins. Without connections, it will simply multiply tables (create Cortesian join). bi-directional connection should translate into full outer join (I think), and you should avoid the latter two like a plague. – RADO Oct 18 '18 at 05:00
  • Thanks again! One last detail for now: When you have two tables in Power BI, it seems that a relationship between them is established by default. And if you try to show the contents of a table using a Table Visualization, [the table will only display distinct values of the categories](https://stackoverflow.com/questions/52759886/power-bi-make-a-table-visualization-display-all-observations-of-a-dataset). – vestland Oct 18 '18 at 05:09
  • (continued) Now I'm guessing that this is a feature to make a table subsettable by making selections in another table, and that only displaying unique tables is the same thing as ensuring that the table has a primary key. Is this somehow correct, or am I way off again? – vestland Oct 18 '18 at 05:09
  • 1
    Default relations are a feature of PowerBI, which can be turned off (that's what I always do - I prefer to set relations myself) - File->Options->Current File->Data Load->Relationships, turn off "Autodetect new relationship after data is loaded". As for the distinct values, it's a more complex question. My advice: never, ever create visuals by dragging table columns into value area. Write DAX measures and use them instead. Some time later you will thank me :) – RADO Oct 18 '18 at 05:27
1

In the RM (relational model) & ERM (entity-relationship model) tables represent relation(ship)s/association. Hence, relational in "RM" & relationship in "ERM".

FKs (foreign keys) get erroneously called "relationships" in pseudo-ERM methods. A SQL FK constraint says subrows appear elsewhere as PK (primary key) or UNIQUE. A DBMS uses them to disallow invalid updates & to optimize queries.

Power BI "relationships" are not FKs. They are instructions on how to build queries.

When there is a FK we do often want to join on it. So we often want a Power BI relationship when there is a FK.

Create and manage relationships in Power BI Desktop
(See also its Download PDF link for Developer.)

PS We do not need constraints to hold or be declared or be known to query. The constraints (Including PKs, FKs, UNIQUE & cardinalities) are determined by the table meanings--(characteristic) predicates--& what business situations can arise. If constraints hold then we just sometimes get fewer rows than otherwise & some query pairs always return the same results when otherwise they wouldn't.

Foreign keys are not needed to join tables!
Is there any rule of thumb to construct SQL query from a human-readable description?

PS Cross join is inner join with a TRUE condition (or no condition in some DBMSs), period. Whether there is a "relationship" aka FK is irrelevant. If the condition is FK=PK or anything else other than TRUE then it's not a cross join; otherwise it is a cross join whether or not there is a FK between the tables. It's just that we frequently want PK=FK in a condition & tools can & do use the presence of a FK towards a default condition.

CROSS JOIN vs INNER JOIN in SQL Server 2008

philipxy
  • 14,867
  • 6
  • 39
  • 83
1

You asked "What is happening under the hood?" The simple answer is "Statements about relationships."

Many well meaning people draw ER diagrams and seem to either forget or be unaware of the fact that their ER diagrams are really "pictures of statements in language."

The problem is ambiguity.

Many well meaning people jump straight to ER diagrams without also expressing the logical statements on which their ER diagrams are based. In effect, this means that the person who draws the ER diagram seems to expect that the "reader" of the ER diagram will be able reconstruct the statements from which the ER diagram was drawn.

Here is an example to illustrate what I mean. My purpose is to show the linguistic basis of the "under the covers" relationship between Students and their Addresses.

So, what's under the covers is language!

A simple diagram ER Diagram 1

The statements from which the diagram is derived. The statements

A more complex diagram

Student lives at Address

The statements from which the diagram is derived.

enter image description here

Ken Evans
  • 362
  • 1
  • 11