1

I need some best-practice and performance advice.

Let's say I have three tables: Employees, Jobs and Ranks. Every employee has a job and a rank, so obviously I should reference those tables in my Employees table.

My question is, which of these options is best:

1) Each job and rank are stored with a unique ID paired a descriptive name. The Employees table should reference the unique ID in the other table, thus saving memory (the descriptive name is only saved once in the Jobs/Ranks table), but to see the descriptive names I'll need to do JOINs:

SELECT Employees.EMPL_ID, Ranks.R_NAME, Jobs.J_NAME
FROM Jobs
JOIN Ranks ON Ranks.R_ID=Employees.RANK
JOIN Jobs ON Jobs.J_ID=Employees.JOB

2) Just unique descriptive names. It can be a waste of memory, because i repeatedly save the descriptive name of each rank / job, but I save time on my SELECT statements

<EDIT:>

Just to clarify, my main concern is the performance I'll have to deal with if I'll need to perform SELECTs with multiple JOINs instead of one SELECT statement.

I want to be able to deal with lots of traffic - specifically, Employees requests to see their Job and Rank.

<EDIT>

Examples:

Option 1 (IDs and names):

Employees:
 __________________________
/ EMPL_ID  |  RANK  | JOB  \
|    1     |    2   |  3   |
|    1     |    1   |  3   |
|    1     |    1   |  1   |
\__________|________|______/

Ranks:
 __________________
/  R_ID  |  R_NAME \
|    1   |   GRUNT |
|    2   |   BOSS  |
\________|_________/

Jobs:
 ____________________
/  J_ID  |  J_NAME   \
|   1    | JANITOR   |
|   3    | PRESIDENT |
\________|___________/

Option 2 (unique names):

Employees:
 _______________________________
/ EMPL_ID  |  RANK  | JOB       \
|    1     |  BOSS  | PRESIDENT |
|    1     |  GRUNT | PRESIDENT |
|    1     |  GRUNT | JANITOR   |
\__________|________|___________/

Ranks:
 __________
/   R_NAME \
|    GRUNT |
|    BOSS  |
\__________/

Jobs:
 ___________
/  J_NAME   \
| JANITOR   |
| PRESIDENT |
\___________/
Dori
  • 1,035
  • 1
  • 12
  • 22
  • 1
    You always can, it never hurts, and it sometimes helps. If unsure, add a numeric ID. – John Dvorak Dec 02 '12 at 13:27
  • @JanDvorak: It's not true to say that "*it never hurts*", as there is clearly a cost in joining/looking up values from another table. In most cases, such cost may be immaterial - but it's always non-zero. – eggyal Dec 02 '12 at 13:44

2 Answers2

1

Yes always give each row a unique id.

Best Practice it to always have this for each table. Usually called 'id' or the-table-name_id'

It should have no business value.

Many 'guaranteed unique' records later find the need or presence or duplicate records and always having a unique primary key helps hugely when this is met / discovered.

One example of 'unique'... that isn't.... if a system has people's Social Security Numbers they should be unique. However one could be mistyped. Then when the person with the 'mistyped' value presents and their number is tied typed in... In allowing / resolving this is will be really helpful for all rows to have their own id that is not the ssn and has no business value at all other than identifying the row.

Unique records is a very well known problem. Having a unique ID for all records is part of the solutions that address it.

The exception to all of the above is performance. I am not too concerned about the join speed for a few thousand records as SQL databases are well designed for speed in doing that. I have found the advantage of unique identification out-weighs disadvantages. There may be cases where you change the above practice due to performance requirements. For instance if there are millions of records that have to be loaded into memory, the overhead of unique ID's space may become an issue. Often if these cases though folks start to look at no-sql solutions like Redis, MongoDB, etc.

Here are some additional references on SO and other sites:

What's the best practice for primary keys in tables?

in general, should every table in a database have an identity field to use as a PK?

http://www.sql-server-performance.com/forum/threads/do-i-need-a-unique-identifier-or-identity-column.16910/

is an ID column really needed in SQL?

As well commented in one answer "use of natural vs. surrogate keys in kind of a religious debate in the community'. Also there's a comment about how the answerer got their 'rules'... tee-hee...

Community
  • 1
  • 1
Michael Durrant
  • 93,410
  • 97
  • 333
  • 497
  • Can you cite any references for your "best practice" assertion? Especially for naming the column `id`, which I feel is poor practice (it leads to ambiguity across joins). The field value itself surely meets your criteria to be itself a primary key? – eggyal Dec 02 '12 at 13:29
  • Not really. Just what I learned in the field and in various classes over 15 years of application building and 7 years of data warehousing. – Michael Durrant Dec 02 '12 at 13:32
  • 1
    I consider SO to be less 'cited references' (Wikipedia) and more 'personal recommendations' but others may be able to help with them. – Michael Durrant Dec 02 '12 at 13:33
  • Sure, added some more comments in my answer – Michael Durrant Dec 02 '12 at 13:41
  • To my reading, all of your cited links contradict your statement "*Best Practice it to always have this for each table*". Most of the links suggest that natural keys have their place, as do surrogates. The OP wanted to know which was most appropriate to his needs, and I fear your answer is merely following "always have a surrogate key" dogma (although I do like the SSN example of when natural keys can go awry). – eggyal Dec 02 '12 at 13:49
  • Sure, I am just contributing my opinion and some links not really looking for a protacted discussion. Might just be best for you to put your points in an answer. As mentioned this becomes a religious debate so lets not do that :) Both of our values and important are important. – Michael Durrant Dec 02 '12 at 14:19
  • Sorry, wasn't aiming for protracted discussion either. Thing is, I don't have an answer to the OP's question (or I would have posted it): I'm genuinely curious to hear others' opinions (and, obviously, their justifications for them). – eggyal Dec 02 '12 at 14:27
  • Thanks, I'll go with this as my answer and rephrase my question in a new thread. Not that this wasn't informative, but I realize that I need to get a bit more specific – Dori Dec 02 '12 at 14:38
0

I would highly recommend that you do add an EMPL_ID (EmployeeID). At the moment your application may work perfectly fine but when you extend it, even if you think you are not going to, the EMPL_ID will come in handy.

Not only that, but if you have the EMPL_ID available in code and need access to some other table that you build out now or in the future, all you have to do is add the EMPL_ID to that table instead of having to duplicate both the R_ID and J_ID in the new table.

An example would be if you added a tblNotes Table. (I don't know the scope of your application so I will just refer to a notes table for this discussion)

In this example, you would only need the following columns: noteID, EMPL_ID, note, noteDateTime....

Without adding an EMPL_ID you would have extra columns that are unnecessary and most likely in multiple tables!

Also, adding indexes would have to be on only one column.

I always add an ID to every table because it makes life so much easier, esp when your application grows. Also, I have even seen in some cases that the company has had two employees with the same name! Of course, it is unlikely they have the same rank and job but this is just food for thought!

I hope I understood your question correctly and gave you some helpful information.

John

John
  • 56
  • 2
  • I don't think the question was about how to define a PK on the `Employees` table, but rather on the `Ranks` and `Jobs` tables. – eggyal Dec 02 '12 at 13:45
  • Yup. @eggysl is right, i just wanted to know about R_ID and J_ID – Dori Dec 02 '12 at 14:06