1

I have one table. How can I normalize this.

enter image description here

philipxy
  • 14,867
  • 6
  • 39
  • 83
sanjeet
  • 1,539
  • 15
  • 37
  • 1
    have you read all the different forms of normalization ? – bane19 Aug 24 '15 at 19:53
  • yaa i read it , and i think it should be 3nf but not not sure that why i am here. – sanjeet Aug 24 '15 at 19:54
  • 4
    Any answers won't help you here in the long run. Better learn yourself, try and then come back here with doubts/problems you are facing. – bane19 Aug 24 '15 at 19:56
  • 4
    What are the keys and dependencies in this table? What makes you think it doesn't already satisfy 3NF? – nvogel Aug 24 '15 at 20:28
  • 3
    Please say what minimal normal form you are supposed to "normalize" toa and why this table "should be [in] 3nf" and why you are "not sure". Otherwise your question is just asking for a textbook's chapter on normalization. – philipxy Aug 25 '15 at 04:41
  • You can refer below link from SO https://stackoverflow.com/questions/246701/what-is-normalisation-or-normalization – Ashutosh B Bodake Dec 05 '18 at 06:00
  • 1
    Please [use text, not images/links, for text--including tables & ERDs.](https://meta.stackoverflow.com/q/285551/3404097) Paraphrase or quote from other text. Use images only for what cannot be expressed as text or to augment text. Images cannot be searched for or cut & pasted. Include a legend/key & explanation with an image. Make your post self-contained. – philipxy May 22 '19 at 09:10

2 Answers2

3

Good question given your data set. Keep in mind the whole point of normalization is to reduce duplication. 3NF is often the best way to go. But given my experience I've found very little benefit to pulling a repeated value out into a different table if it will be the only value in that table. Take for example your most duplicated column, emp_type. If you were to normalize it into a separate table , it would look like this:

Emp_Type_Id | Emp_type
------------------------
1           | Manager
2           | Engineer
3           | Tech Support

And your current table would look like:

Emp_ID | Emp_Name | Salary | Emp_Type_Id | Emp_Skill
----------------------------------------------------
1      | raj      | 90000  | 1           | Department
2      | ravi     | 50000  | 2           | Software
3      | shyam    | 70000  | 2           | Hardware
.
.
.

This is technically more normalized than before as the emp_type value is no longer repeated in your database. But it's significantly more hassle to deal with a separate table and a relationship for a single value. If emp_type included other information as well such as Valid_Salary_Range, Department_Location, etc. then it would absolutely be best to normalize it out into a separate table. But if it's only a single value you are normalizing, what's the difference between storing the value as opposed to storing an ID pointed to that value? In my opinion it's a pointless extra step.

Long story short, I would not normalize your table at all, it's already normalized to a sufficient level.

danjuggler
  • 1,261
  • 2
  • 21
  • 38
  • 1
    Normalization to higher NFs does not involve introducing new column names. Also this shows no understanding of normalization. – philipxy May 22 '19 at 09:12
-1

Even before we start, I'll mention each step's rules first because I don't know the dependencies in your table. Stating the rules and my assumptions about your data will clarify how I arrived at the final solution.

That way, for the least, you will take away the understanding of how normalization works and can go ahead with doing it by yourself on the data that only you understand the best.



The Natural Order of Database Designing

  1. Start with specifying the entities (these are the tables)
  2. Specify the needed attributes of your entities (these are columns)
  3. Specify the unique property of your entity (the primary key of your table). If there is none, give one yourself (also called synthetic or surrogate primary keys).
  4. Finally specify the relation between these entities (primary key-foreign key relation and 1-N, N-N relation etc.)


Now, that the basic designing rule is in place, it is easy to see that your single table actually has four separate entities fused together. They are:

1. Employee - who has id and name

2. Role - which can be Engineer, Tech Support etc

3. Department - which can be Software, Hardware, Department (could use better word though), Voice etc.

4. Salary - which will have EmpId, DateOfChange and Amount. This is because employees have different salaries and also the salaries of same employee changes over time.

So, we are going to break this table to four tables as mentioned above. Out of the four tables, Employee table has id as the primary key, Role and Department need synthetic (maybe even auto-incrementing) keys and Salary will have {EmpId, DataOfChange} together as the primary key. This will look like:

Table Name    Columns
Employee      Id, Name, RoleId, DeptId
Role          RoleId, RoleType
Department    DeptId, DeptName
Salary        EmpId, DateOfChange, Amount

All the above table can have more items. I'm trying to hit the design that has minimalistic difference from that table you have already given. Like Salary table can also have a field like ReasonOfSalaryChange which could have values like NewHiring, Promotion etc... But we hold the changes to minimal.

Now we can move to actual normalization of your data.


Commonly Used Normalizations

I never had to use anything above 3NF. So I'll limit the explanation below till the third normal form.

1NF: simply states that all columns must have atomic values. If a column needs more than one value, create another table. Your new tables pass that test. All values in your table's columns are atomic.

2NF: requires 1NF qualification and that any non-key field should be dependent on the entire primary key. All your non-key fields (Name in Employee, RoleType in Role, DeptName in Department, Amount in Salary) depend on the respective table's primary key (Id, RoleId, DeptId and {EmpId, DateOfChange}). So the tables qualify 2NF fine.

3NF: requires 2NF qualification and that no non-key field should depend upon any other non-key field. This means that there should be no dependencies between table's columns except on the primary keys. Role, Department and Salary tables are default 3NF qualified as they have only one non-key column and it's dependent on PK. Employee table, you can verify yourself, has no non-key element that is dependent on any other non-key element. So the tables are perfectly 3NF qualified the way they currently are.


All that's left now is to define that RoleId, DeptId and EmpId are foreign keys from the Role, Department and Employee table respectively.

displayName
  • 13,888
  • 8
  • 60
  • 75
  • 1
    I would add that in the real world, salary probably needs to be pulled out into a salary history table with start and end dates for the salary as it was over time. This is temporal data and past values may be needed for research into problems and reporting.. – HLGEM Sep 11 '15 at 14:53
  • 1
    People forget that we don't store data just to store it. If it is to have value, we have to understand what the data will be used for not just how it gets into the database. Too many people design only to make it easy for the application to insert the data and never consider why they want that data in the first place. When you don't think ahead to why, you don't find out that most of the time it is going to be used in "What was the salary on this date?" type of queries. If you haven't stored the data that way, then you can't answer the questions the users need to ask about the data. – HLGEM Sep 11 '15 at 17:28
  • 2
    Downvoted for perpetuating entity/table conflation. Entities are represented by _values_, tables represent predicates. – reaanb Oct 01 '15 at 05:33
  • 1
    "entities (these are the tables)" No. "relation between these entities (primary key-foreign key relation and 1-N, N-N relation etc.)" No. "2NF: requires 1NF qualification and that any non-key field should be dependent on the entire primary key" No, non-prime fields must not be dependent on a proper subsetof a CK. "3NF: requires 2NF qualification and that no non-key field should depend upon any other non-key field." No. Look up a correct definition yourself in a (good) textbook. "PK" No. PKs are irrelevant, CKs matter. Etc. ("dependencies in your table" Yes, relevant.) – philipxy May 22 '19 at 09:31