7

My web application will have various types of users: a root user, sys-admins, customers, contractors, etc. I'd like to have a table "User" that will have the columns "Username", "Password", and "Role" with role being one of the aforementioned roles. I will, for example, also have a table called "Customer" for storing customer-specific attributes.

Here's my question. Since all usernames will be unique, I could use the username as the primary key for the User table. Would it make more sense, though, to create a "User ID" column and use it rather than the Username column as the PK? There are many other tables that will use this User PK as a foreign key and I would think that, from a performance standpoint, it will be faster to compare user IDs rather than username text strings.

Thank you for your response.

Mr. Alien
  • 153,751
  • 34
  • 298
  • 278
Jim
  • 13,430
  • 26
  • 104
  • 155

3 Answers3

9

This is the old natural vs. surrogate key debate.

In a nutshell, nothing is cut-and-dry and you'll need to balance between competing interests:

  1. If you anticipate you'll need to update the key, use surrogate (which doesn't need updating) to avoid ON UPDATE CASCADE.
  2. If you need to lookup the key, but not the other columns, use natural key to avoid the JOIN altogether. But if you need to lookup more than just the key, use surrogate to make FKs and accompanying indexes slimmer and more efficient.
  3. Do you anticipate range scans on the natural key? If yes, cluster it. However, secondary indexes can be expensive in clustered tables, which plays against a surrogate key.
  4. Is your table very large? Save space by having only one index (on natural key) instead of two (on natural and surrogate).
  5. Composite natural keys (that are at the child endpoints of identifying relationships) may be necessary for correctly modeling diamond-shaped dependencies.
  6. Surrogates may be more friendly to ORM tools.

In the case of usernames...

  1. You'll probably want to allow the update.
  2. It's unlikely you'll need to lookup only the username.
  3. Range scans on usernames make no sense (unlike equi-searches).
  4. You aren't storing the whole population of the Earth, are you?
  5. We are talking about "stand alone" natural key here, not a natural key that is the result of an identifying relationship.
  6. Unknown?

...so the surrogate key is probably justified in this case.

Community
  • 1
  • 1
Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • Thanks, Branko. You brought up some interesting points I hadn't considered. I've decided to go with the surrogate key as you and HLGEM suggested. – Jim Jul 25 '12 at 01:05
3

I would create the userid because while usernames might be unique, they are not generally unchangeable and I would prefer not to have to change thousands or millions of records because HLGEM wanted to become HLGEM_1. Further joins on integers are faster.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
3

There are two schools of thought. Database purists believe that you should always use natural keys wherever possible (and it makes sense). So if you subscribe to that line of thought, and the username is unchangeable, then make the key the username.

Database pragmatists believe that surrogate keys are more useful, and tend to stand up better to changes in requirements. They can also be faster in some cases, particularly with large string keys. There are also security concerns, in that using natural keys you might leak information that wouldn't be available from a surrogate key.

Erik Funkenbusch
  • 92,674
  • 28
  • 195
  • 291