ER model: when to add a relationship and when not to add

Question

Consider this example of an ER design. I have three entities:

Singer
Album
Song

Each singer has at least one album (by definition) and each album has at least one song. Each song belongs to only one album. So I have also two relationships:

album:singer (1-or-more:exactly-1)
song:album (1-or-more:exactly-1)

If I would like to get who sing the song, I could join tables pivoting on album entities.

Now my question is: should I add also a direct relationship between singer and song or not? I don't understand if it depends only on the use case or if there is a strict rule/best practice. If I add it, I use more disk space but I need less memory for querying song author (I don't need joins).

Which is the solution?

step 1: don't worry about disk space as the primary concern - find an appropriate model for the domain first, _then_ you can consider the impact on space/time and decide if you need to adjust your model. also: your 1-to-many relationship between album and song won't support a "greatest hits" album that consists of previously published songs (unless your domain requires that songs are unique to single albums, of course) — fspinnenhirn, Jan 21 '16 at 23:33
Thank you! Your caveat about "greatest hits" is correct. I simplified the modeling to focus the question on the main topic: is there a (thumb) rule/best practice to decide whether adding a relationship or not? — floatingpurr, Jan 22 '16 at 01:48
@philipxy thanks for your correction, I changed the kind of cardinality of the relations in the question. — floatingpurr, Jan 24 '16 at 18:14

score 2 · Accepted Answer · edited May 23 '17 at 10:27

(I am assuming that album:singer & song:album are actually both 1-or-more:exactly-1 because your "one to many"s contradict your other descriptions.)

Re 2 tables being correct: You should not use all three. SingerSong is always equal to (SingerAlbum JOIN SongAlbum) PROJECTed on SINGER & SONG. Using all 3 is redundant in a way that requires the update of multiple tables when the two-table design would involve only one and that requires a multi-table constraint to enforce consistency between SingerSong and the others.

Re designing tables: Every table has a meaning/predicate and every present row & every absent row (that fits it) makes a statement/proposition. We choose sufficient meanings/predicates to be able to describe all situations that can arise. (See this re bases and this re queries and my other answers using "predicate".) "Redundancy" is when two rows make overlapping statements/propositions at the same time. Then we have to change multiple rows when in another design we could just change one. (Or more vs fewer.) We just have to be aware of what our chosen meanings/predicates say. Sadly most information modeling methods & products don't address meanings/predicates of relationships/relations even though they are the foundation of modeling & the source of the terms in Entity-Relationship and Relational Model. Normalization addresses certain redundancies that can be eliminated from a meaning/predicate by splitting it at an AND. (Hence replacing its associated table/relation by a JOIN of others).

Re needing 3 tables: Here you have predicates "singer SINGER made album ALBUM", "song SONG is on album ALBUM" and "singer SINGER sings song SONG". We can't directly express the third in terms of the previous two. But since the third predicate is implied by the first plus the second in your particular application, you don't need it. Ie "singer SINGER sings song SONG" in your application exactly when "for some ALBUM, singer SINGER made album ALBUM and song SONG is on album ALBUM". (Ie the predicate satisfied by the rows in (SingerAlbum JOIN songAlbum) PROJECTed on SINGER & SONG.) But if say multiple singers could be on an album not both singing all songs or if say there were songs sung but not on an album then the first two would not imply the third so you could neither express nor infer what rows are in the third from those in the others. (Notice how your adding "at least one"/"only one" complicated the predicates and constraints and led to a redundant design even though you thought you "simplified the modeling to focus the question on the main topic".

Re optimization: Always produce the most straightforward design first. You should not worry about storage and time tradeoffs until you have a lot more experience with design and querying to appreciate the relevant factors. And then you should use estimates and measurements to justify changes.

Well, if I understand correctly your answer, I should not use a relationship if identical information is available otherwhere in my model (e.g., Join + projection). Right? : ) — floatingpurr, Jan 22 '16 at 01:58
Yes, that is the general idea. See my edit. Note that you *would* need 3 tables if you didn't involve certain clauses like "exactly one" because "identical information" *wouldn't* be present. PS Better to never talk about ("more" or "less") "information". Just talk about what a database or (base variable or query result) table or (present or absent) row "states"/"asserts". (As a consequence of what rows are in what tables.) — philipxy, Jan 22 '16 at 05:22
+1 for the last point - always get it working first, and then optimize *only if* you find that speed or space is an issue. — Andrew Williamson, Jan 22 '16 at 15:54

Andrew Williamson · Answer 2 · 2016-01-21T23:41:16.653

1

Edit: I just realized, you're going with only one singer per album. In this case, you have three relationships:

Singer <-> Album
Song <-> Album
Singer <-> Song

Your understanding is correct - you can opt to use all three, for more disk usage but faster queries. Or, you can choose only two, for less disk usage and slower queries.

It is always a case by case basis - if you know that you can represent the relationships with two tables, without losing data, then the choice between speed and size is yours to make. One important factor to consider, though, is future proofing. Say for example, in the future you might want to have more than one singer per album. Then it is no longer implicit as to which singer sings which song. All things considered, it is a database, so you usually expect it to be big and fast.

edited Jan 21 '16 at 23:41

answered Jan 21 '16 at 23:27

Andrew Williamson

8,299
3
34
62

Thanks Andrew. So ER theory does not suggest a best practice in that case, ins't it? – floatingpurr Jan 22 '16 at 02:00
Not so far as I can remember. But @philipxy makes a good point about redundancy - normally, you let the database enforce constraints, so it makes it hard to get the database into an inconsistent state. However, when you have redundant data, it is quite easy to enter contradicting values. It is not easy to set up the database to enforce consistency when one table is equivalent to the join of two others. – Andrew Williamson Jan 22 '16 at 16:03

ER model: when to add a relationship and when not to add

2 Answers2