6

I'm trying to draw an ER diagram for my project management software describing the following. It contains these entities:

  • project - software projects
  • tasks - software projects that can be broken into a number of tasks
  • employees - employees that belong to this software

And:

  1. A project can be divided into tasks. (Tasks can be created by the admin user, who can assign those tasks to selected projects. Here there is only assignment of tasks to projects, not assignment of employees to projects.)

  2. Employees can be assigned to projects.
    (An employee can be assigned to projects. Here there is only assignment of employees to projects, not assignment to tasks of projects.)

  3. For a selected task of a selected project we can assign employees from the pool--employees that are assigned to that project in 2. (This time we must specify project, task & employee; all 3 selections are mandatory.)

The input processes for 1, 2 & 3 above can be done in separate pages in the system. You can select any of them first.

For the above relationships I created this ERD:

enter image description here

Consider

  • relationship 1 between project and task
  • relationship 2 between project and employee

Is there a need for the two separate relationships as in the ER diagram, relationship no 1 & relationship no 2?

or

can we use only relationship 3 among project, employee and task, relationship no 3, for that also?

philipxy
  • 14,867
  • 6
  • 39
  • 83
Susantha7
  • 898
  • 1
  • 20
  • 38
  • @SachithrraDias & Susantha7 Post-edit/bounty: If my answer has not answered the question for you, please comment on it to say why. Is there something you don't understand? As I said in my last a comment on my answer, if you want to know whether your design is "correct" then that is not the question that the question asks and you need to clarify it (without invalidating (reasonable) answers) or ask a new question. Putting a bonus on an question that does not clearly ask for what you want is not going to help get you the answers you want. PS I edited my answer to clarify. – philipxy Aug 14 '17 at 23:06
  • Susantha7 (& @SachithrraDias) Please explain how an entity type's cardinality is to be interpreted in your diagram. There are two styles. In one style an entity type's cardinality is how many times a participating entity can appear in rows. In the other style an entity type's cardinality is how many times combinations of the other entities can appear with a participating entity instance. Anyway, ER diagrams only give some of the info needed for a design. You should really give the condition (*predicate*) for when a row goes in a relationship and give DDL including constraints. – philipxy Aug 14 '17 at 23:11
  • i wont understand much in your answer.... thats why i ask my diagram is correct... can you please explain your answer simple to understand way... and to the point – Susantha7 Aug 15 '17 at 04:16
  • Then use comments on my answer to ask for it to be clearer where you do not understand. What is the first thing you don't understand? Do you understand relational algebra or SQL? PS Lately translate.google.com made some quantum AI improvements, be sure to try it. – philipxy Aug 15 '17 at 04:19

1 Answers1

4

TL;DR You need all three relationship types/tables. Because if you drop one then in some situations you lose data--there is no way to use the remaining ones to answer all the same questions.

Different constraints could mean we can drop a relationship/table because it can be expressed in terms of others. Normalization to higher NFs (normal forms) tells us when we can replace a relationship/table by smaller/simpler ones.


Each relationship table holds the rows that participate in the relationship. We can describe the relationship via a predicate (statement template):

1 Divides_to holds (T, P) rows where project P divides to task T
2 Has holds (E, P) rows where employee E is assigned to project P
3 holds (E, T, P) rows where employee E is assigned to task T on project P

Can we drop 1? If we ignore employees in 3 then we get rows where some employee is assigned to task T on project P. But (per above) that is not the rows in 1. Maybe project p1 divides to task t1 in 1 but no employee is assigned to task t1 on project p1; then row (t1, p1) in 1 is not a subrow in 3. And there is no task info in 2. So we can't use 3 & 2 to replace 1.

Can we drop 2? Similarly: If we ignore tasks in 3 then we get rows where employee E is assigned to some task on project P. But (per above) that is not the rows in 2. Maybe employee e1 is assigned to project p1 but is not assigned to a task on project p1; then row (e1, p1) in 2 is not a subrow in 3. And there is no employee info in 1. So we can't use 3 & 1 to replace 2.

Can we drop 3? Using 1 & 2 we can get rows where employee E is assigned to project P AND project P divides to task T. But (per above) that is not the rows in 3. They differ if an employee assigned to a project isn't assigned to all its tasks or if a task of a project doesn't have all its employees assigned to it. There's no other way to generate 3 from 1 & 2. So we can't use 1 & 2 to replace 3.

So we need all three relationships.

When constraints hold, certain query expressions always return the same results as certain others that otherwise wouldn't. So under different constraints we might be able to drop a relationship/table because we can express its content via queries/views of others. And we might choose different relationships/tables.

Normalization to higher NFs guides decomposing a relationship into simpler others by which it can be expressed instead per certain constraints.


PS 1 That's also why we need the entity types/tables and not just the relationship types/tables. (If we didn't want them anyway for entity-specific attributes or just ER modeling conventions.) Eg the three relationships can't tell you about employees that aren't assigned to a project or to a task & project. Similarly for tasks & for projects.

PS 2 We ignore an attribute in relational algebra by not projecting on it. We ignore a column in SQL by not selecting it. The result's predicate is that FOR SOME value for the attribute/column, the old predicate holds. Relational natural join gives the rows whose relationship/predicate is the AND of input relationships/predicates. In SQL for no duplicate rows & no shared nullable columns that's select distinct from natural join.

PS 3 Under common sense your design satisfies certain constraints: If a task-project pair appears in 3 then it must appear in 1 and if an employee-project pair appears in 3 then it must appear in 2. One way to reflect that in ER modeling is by reifying the task-project & employee-project relationships to associative entities then replacing 3 by a what ER calls a binary relationship on those entities. Relationally, the relationship/table is still ternary on values, where certain subrows happen to identify those entities. A way to get a constrained relationally binary 3 is to add an employee-project PK (primary key) or CK (candidate key) id in 2 and replace the composite FK (foreign key) in 3 by such an id. Then we have a binary on entities and on values. Some pseudo-ER methods do this.

PS 4 This style of (true Chen) ER diagram doesn't typically use SQL nulls. But as it happens you could replace all three relationships by a variant of 3 with nulls. You would null-extend the binary relations and union them with the ternary. As usual, nulls complicate predicates. Usually we add a nullable column as an alternative to adding a separate table sharing a null-free CK (candidate key). But this is different, without the savings in space or joins; it just complicates things. (Including important constraints.)

    E IS NULL
AND task T is of project P
AND NOT EXISTS E [employee E is assigned to task T of project P]
OR  T IS NULL
AND employee E is assigned to project P
AND NOT EXISTS T [employee E is assigned to task T of project P]
OR  employee E is assigned to task T of project P

(Also it's problematic in SQL because SQL unique, primary key & join are not the relational things by those names because they treat null specially.)

PS 5 Some answers of mine re such ternary vs binary relation(ship) types/tables/predicates:
Should this ER diagram use a ternary relationship instead
Best Solution - Ternary or Binary Relationship
Why can't you just join in fan trap?
And re design & predicates:
Modeling multiple many to many relationships between the same entities in a relational database
What is the difference between an entity relationship model and a relational model?
Is there any rule of thumb to construct SQL query from a human-readable description?

PS 6 Has is an unhelpfully generic relationship name/meaning/table. Use meaningful names like Is_assigned_to or Assignment.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • you mean is above ER diagram is correct... for this scenario – Susantha7 Aug 09 '17 at 16:20
  • 2
    1. Would 'yes' or 'no' be informative? 2. That is a different question than in your post, ask a new question. But it's probably too broad. Because: 3. There is seldom a 'correct' in engineering, there are tradeoffs. Also, don't use that word until you know & say exactly what *you* mean by it. 4. Otherwise you're really just asking for a book on information modeling & database design. Read some. 5. Your tables reflect your description. No table can tell you all of what another does. That's reasonable. My answer explores a couple of designs. Try some. Be able to describe every situation. – philipxy Aug 10 '17 at 02:56