TL;DR You need all three relationship types/tables. Because if you drop one then in some situations you lose data--there is no way to use the remaining ones to answer all the same questions.
Different constraints could mean we can drop a relationship/table because it can be expressed in terms of others. Normalization to higher NFs (normal forms) tells us when we can replace a relationship/table by smaller/simpler ones.
Each relationship table holds the rows that participate in the relationship. We can describe the relationship via a predicate (statement template):
1 Divides_to
holds (T, P)
rows where project P divides to task T
2 Has
holds (E, P)
rows where employee E is assigned to project P
3 holds (E, T, P)
rows where employee E is assigned to task T on project P
Can we drop 1? If we ignore employees in 3 then we get rows where some employee is assigned to task T on project P
. But (per above) that is not the rows in 1. Maybe project p1 divides to task t1 in 1 but no employee is assigned to task t1 on project p1; then row (t1, p1) in 1 is not a subrow in 3. And there is no task info in 2. So we can't use 3 & 2 to replace 1.
Can we drop 2? Similarly: If we ignore tasks in 3 then we get rows where employee E is assigned to some task on project P
. But (per above) that is not the rows in 2. Maybe employee e1 is assigned to project p1 but is not assigned to a task on project p1; then row (e1, p1) in 2 is not a subrow in 3. And there is no employee info in 1. So we can't use 3 & 1 to replace 2.
Can we drop 3? Using 1 & 2 we can get rows where employee E is assigned to project P AND project P divides to task T
. But (per above) that is not the rows in 3. They differ if an employee assigned to a project isn't assigned to all its tasks or if a task of a project doesn't have all its employees assigned to it. There's no other way to generate 3 from 1 & 2. So we can't use 1 & 2 to replace 3.
So we need all three relationships.
When constraints hold, certain query expressions always return the same results as certain others that otherwise wouldn't. So under different constraints we might be able to drop a relationship/table because we can express its content via queries/views of others. And we might choose different relationships/tables.
Normalization to higher NFs guides decomposing a relationship into simpler others by which it can be expressed instead per certain constraints.
PS 1 That's also why we need the entity types/tables and not just the relationship types/tables. (If we didn't want them anyway for entity-specific attributes or just ER modeling conventions.) Eg the three relationships can't tell you about employees that aren't assigned to a project or to a task & project. Similarly for tasks & for projects.
PS 2 We ignore an attribute in relational algebra by not project
ing on it. We ignore a column in SQL by not select
ing it. The result's predicate is that FOR SOME value for the attribute/column, the old predicate holds. Relational natural join
gives the rows whose relationship/predicate is the AND of input relationships/predicates. In SQL for no duplicate rows & no shared nullable columns that's select distinct
from
natural join
.
PS 3 Under common sense your design satisfies certain constraints: If a task-project pair appears in 3 then it must appear in 1 and if an employee-project pair appears in 3 then it must appear in 2. One way to reflect that in ER modeling is by reifying the task-project & employee-project relationships to associative entities then replacing 3 by a what ER calls a binary relationship on those entities. Relationally, the relationship/table is still ternary on values, where certain subrows happen to identify those entities. A way to get a constrained relationally binary 3 is to add an employee-project PK (primary key) or CK (candidate key) id in 2 and replace the composite FK (foreign key) in 3 by such an id. Then we have a binary on entities and on values. Some pseudo-ER methods do this.
PS 4 This style of (true Chen) ER diagram doesn't typically use SQL nulls. But as it happens you could replace all three relationships by a variant of 3 with nulls. You would null
-extend the binary relations and union
them with the ternary. As usual, nulls complicate predicates. Usually we add a nullable column as an alternative to adding a separate table sharing a null-free CK (candidate key). But this is different, without the savings in space or joins; it just complicates things. (Including important constraints.)
E IS NULL
AND task T is of project P
AND NOT EXISTS E [employee E is assigned to task T of project P]
OR T IS NULL
AND employee E is assigned to project P
AND NOT EXISTS T [employee E is assigned to task T of project P]
OR employee E is assigned to task T of project P
(Also it's problematic in SQL because SQL unique
, primary key
& join
are not the relational things by those names because they treat null
specially.)
PS 5 Some answers of mine re such ternary vs binary relation(ship) types/tables/predicates:
Should this ER diagram use a ternary relationship instead
Best Solution - Ternary or Binary Relationship
Why can't you just join in fan trap?
And re design & predicates:
Modeling multiple many to many relationships between the same entities in a relational database
What is the difference between an entity relationship model and a relational model?
Is there any rule of thumb to construct SQL query from a human-readable description?
PS 6 Has
is an unhelpfully generic relationship name/meaning/table. Use meaningful names like Is_assigned_to
or Assignment
.