2

What are practices about using email addresses as the primary key? Should I avoid it and use an auto incremented ID number instead or is the engine able to handle it just as well?

MySQL database but i'm interested in how other engines might handle this (PostgreSQL specially).

Michael Durrant
  • 93,410
  • 97
  • 333
  • 497
d00dle
  • 1,276
  • 1
  • 20
  • 33
  • What exactly do you mean with "as database names?" Why would you name a database like the email of a user. Sounds like strange concept. Or do you actually: mean "*use emails as the primary key for a table*" –  Nov 23 '13 at 20:05
  • @a_horse_with_no_name Yes i meant as primary key. Updated my question – d00dle Nov 23 '13 at 20:10
  • 1
    I like this article: http://database-programmer.blogspot.com/2008/01/database-skills-sane-approach-to.html (and I think it answers this specific question, as to what the author would choose) – user2864740 Nov 23 '13 at 20:16

5 Answers5

8

You should always have a unique integer primary key that has no business value. This is then referred to as a surrogate key.

You should store the email address itself in another field, frequently with an index so it can act as a key for lookups.

This will enable you to provide functionality that is based on locating the user based on using the email address for lookup. Any other functionality at that point then uses that records primary key for other operations, e.g. updating the users address.

Michael Durrant
  • 93,410
  • 97
  • 333
  • 497
  • I think "always" is a bit too strong. There are times when there is a simple, static, small and obvious primary key. So long as you use very strict key selection criteria it's fine. Using email addresses as the PK to identify users falls down on quite a few major key selection criteria (most notably, they change) so it's really a *terrible* idea in this case, and there's no good alternative natural key so a surrogate key is certainly the correct choice here. – Craig Ringer Nov 24 '13 at 01:35
4

It would be perfectly reasonable to use an email address only where a fairly narrow set of criteria are met:

  • The email address is the primary entity, it doesn't identify something else (like a user account, say)
  • There are relatively few FK references to the table or fast FK lookups without joins are vital
  • You don't validate email addresses in any way

In other words it's very rarely appropriate to use an email address as a primary key. The only situation I can really think of where it'd be sensible is software that processes a mail stream, where it wants to record stats about each individual email address.

If you're thinking of using it as an identifier for users, don't do it.

The email address its self is the primary entity

You're not using an email address to identify something else, like a user account say, but instead have a table that's all about email addresses. Say, you're keeping track of how many messages went to/from each address. If you're identifying something else with an email address, don't use it as a primary key. Use a surrogate key if there's no perfectly stable small and simple natural key. Names and email addresses change.

There are relatively few foreign key references

There aren't too many FK references to the table that has the email address as primary key, or you require very fast and join-free lookups in the tables with the FK. You can gain a big performance win if you're searching a table directly for a value (the email), rather than joining on another table and testing the other table for the value. The flip side here is that using email addresses instead of generated surrogate keys adds to the storage needed for tables (thus: bigger, slower tables and indexes) so it's only worthwhile if you really expect to search on foreign keys a lot.

You don't validate email addresses

If you have such a concept as a "valid" or "invalid" email address your rules will change sooner or later, and you'll be in a miserable situation if you're using email addresses as primary keys.

Email addresses are weird

These three email addresses are the same:

user.name@DOMAIN.COM
user.name@DoMAIN.CoM
user.name@domain.com

but these three are all different:

user.name@domain.com
USER.NAME@domain.com
User.Name@domain.com

according to the relevant RFC. Some MTAs agree, others treat the whole thing case-insensitively.

Yeah. Don't use them as PKs.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
1

Use an auto-incremented primary key. You do not need to expose this information to the user, you can represent it visually as if the key was the email address, but you need numbers that are internally consistent and do not change over time.

Remember your primary key is used for linking to other tables, so if someone changes their email address you would have to update all the dependent links as well. This is extremely difficult to get right.

It doesn't matter what SQL database you use, they all work roughly the same way and have similar limitations.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Cascades would eliminate the need of [manually] "updating all the dependent links". (Now, I agree that an email shouldn't be used as a PK, but not for that reason.) – user2864740 Nov 23 '13 at 20:13
  • 1
    If you're depending on cascades to keep these things in sync, you have other problems. That's a design locked to a single database instance and won't scale beyond that. – tadman Nov 23 '13 at 20:18
0

One important reason to NOT use business information as a primary key and instead use a surrogate primary key is because of foreign keys. Imagine that someone's email address needs to be updated. Can you imagine what a pain it would be to update all that information in ther keys? If your foreign keys are strict enough, you may wind up having to make a duplicate record, update all the child records, then delete the original record. That's much harder to pull of than simply updating 1 records email address if you use a surrogate primary key (auto generated integer usually)

Joe Love
  • 5,594
  • 2
  • 20
  • 32
  • 2
    Counter: Cascades take of handling updates and email addresses don't change often. – user2864740 Nov 23 '13 at 20:18
  • @user2864740 you do have to be sure to enable those cascading constraints, which I usually don't.. but it's a preference on Updates-- I certainly wouldn't usually do this on deletes. The other thing about never using business data as primary keys is that, often, you'll wind up with multi-column primary keys, which can make programming more tedious-- lots easier with just 1 column -- but this will likely not be the case if the email is stored in just 1 column. I agree, that in this particular scenario, it's not 100% evil, but its still counter to good practice in general. – Joe Love Nov 23 '13 at 23:01
  • I don't disagree with that - only with the assertion that the database *can't* handle it. I view that email addresses make a poor PK because they need not be unique within a domain (e.g. can be "shared") and can be duplicated over time (e.g. company emails); this isn't due to a database limitation. – user2864740 Nov 24 '13 at 06:35
  • I sure hope I didn't give the impression that the DB couldn't handle it. I was merely saying that doing that is part of a larger set of habits that make a DB harder to manage. – Joe Love Nov 24 '13 at 07:02
0

Sensible criteria for choosing and designing keys are: Simplicity, Familiarity and Stability. Email addresses are simple and familiar to people who use them and they change relatively infrequently. Many successful websites and systems require unique email addresses to identify users. Email addresses make perfectly good keys for many purposes.

Given that an email address makes a suitable key, the question is should it be a primary key. The choice of a primary key is something that arises when a table has more than one key and you wish to choose one of them as "preferred" for some purpose. Ideas about what should or should not become a primary key are fundamentally subjective and arbitrary. There is no sound theoretical basis on which to make such a choice because a key designated as primary isn't required to be any different in form or function from any other key. On the basis of human preference alone an email address ought to make a "better" choice of primary key than an unfamiliar and irrelevant incrementing number.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • There are very sound reasons for choosing a surrogate key (integer) over a random varchar that can change over time. – Denis de Bernardy Nov 23 '13 at 22:54
  • Isn't a surrogate key value even more "random" (or at least so arbitrary that it might as well be random)? Surrogate key values change too. There is no absolute basis for choosing one over the other without proper context to explain the purpose for which such a choice is being made. The only reliable answer is "it depends..." – nvogel Nov 23 '13 at 23:11
  • 2
    Email addresses fall down on stability and are absolutely unsuitable for primary keys. They change. They're also not singularly specific - a single email address may be used by multiple people (old people) or, more commonly, one person may have many different email addresses they wish to associate. – Craig Ringer Nov 24 '13 at 01:37
  • Keys can and do change and that's potentially true of *any* key. The fact that multiple people may use a single email address has nothing to do with its suitability as a key. Take a look around. There are a huge number of websites and other systems that use email addresses to uniquely identify user accounts. – nvogel Nov 24 '13 at 08:08