Setting up a basic table, is integer auto-increment primary key a standard?

Question

I've got a web app, I have a concept of users, which will probably go into a user table like:

table: user
username (varchar 32)  |  email (varchar 64)  |  fav_color  |  ...

I'd like username and email to be unique, meaning I can't allow users to have the same username, or the same email. I see example tables of this sort always introduce an integer auto-increment primary key.

Not sure why this is done, is it to somehow speed up queries by foreign keys later on? For example, let's say I have another table like:

table: grades
username (foreign key?)  |  grade

Is it inefficient to be using the username as a foreign key? I want to do queries like:

SELECT FROM grades WHERE username = 'john'

so I guess it'd be faster to do an integer lookup for the database instead?:

SELECT FROM grades WHERE fk_user_id = 20431

Thanks

JIRA made the mistake of mapping to usernames for referential integrity -- there's a "feature request" to fix this that's been there for years... — OMG Ponies, Feb 08 '11 at 20:42

Andrew · Accepted Answer · 2011-02-08T21:20:13.980

What you are asking is somewhat a design decision based on the judgment of the individual data modeler. Personally, in this case, I would include the auto-incremented integer primary key. It is unusual to be able to guarantee that username (and even more so e-mail address) will be unchanging. However, you can design your software so that the same integer primary key always refers to the same user, regardless of what else may change about that user record.

What would help the performance of username lookups would be a UNIQUE constraint on username with an index that corresponds to it. If you really want e-mail addresses to be unique (mostly a business requirement decision), you could also put a UNIQUE constraint on e-mail address. Foreign keys are ignored in the default database engine in MySQL (unfortunately), so I won't bother going into the benefits there from a data modeling perspective.

Edit:

I guess I will go into the benefits for foreign keys if they are being enforced now. Yes, there are provisions for updating all the data that depends on a foreign key (such as ON UPDATE CASCADE). However, they are often poorly understood and viewed as difficult to maintain. It is usually a better practice to have a foreign key refer to something unchanging, hence your integer primary key.

@OMG Ponies Thanks for the update. I guess I've been out of the MySQL loop a little too long. — Andrew, Feb 08 '11 at 21:12

score 1 · Answer 2 · answered Feb 08 '11 at 20:37

My advice, after years of db-building

only use chars as PK when they don't represent anything in the real world.

The real world is a caotic place, and as soon as you use PK's from it, you're one a slope.

Just trust me.

(and there's a speed gain too).

regards, //t

score 1 · Answer 3 · edited May 23 '17 at 12:20

1

It may not necessarily be a "standard" per se, but it is quick, easy, convenient and generally resistant to business key changes.

See also: Pros and cons of autoincrement keys on every table

edited May 23 '17 at 12:20

Community

1
1

answered Feb 08 '11 at 20:39

Daniel DiPaolo

55,313
14
116
115

score 1 · Answer 4 · answered Feb 08 '11 at 20:41

1

Integers as the primary key will make your life a lot easier down the road as your application evolves. Use an index on your username and/or email for the query optimization.

answered Feb 08 '11 at 20:41

gergi

469
2
5

score 1 · Answer 5 · answered Feb 08 '11 at 20:42

1

I like integer keys because:

make joins faster
smaller and faster indexes
never need to change (your username and email field values may need to change)

answered Feb 08 '11 at 20:42

Ray

21,485
5
48
64

VARCHAR(4) takes the same amount of space as INT. – OMG Ponies Feb 08 '11 at 20:42
true, but there is no auto-increment feature to populate it for me – Ray Feb 08 '11 at 21:05

score 0 · Answer 6 · answered Feb 08 '11 at 20:41

0

Indexes on Integer columns perform faster than on large character values. Having the primary key on the narrow identity column is the most optimal solution.

answered Feb 08 '11 at 20:41

Vishal Gamji

59
6

score -1 · Answer 7 · edited May 23 '17 at 11:55

-1

Using real-world data as foreign keys is very problematic, and "inefficient" because they violate referential integrity. You think that user names and emails are unique and will never change? You are almost certainly wrong. Read an earlier question on natural keys

Integer auto-increment primary keys will be faster, but that's not why they're used. They're used because they work. Use them.

edited May 23 '17 at 11:55

Community

1
1

answered Feb 08 '11 at 20:46

Dour High Arch

21,513
29
75
90

1

"Real-world" data as foreign keys does not violate referential integrity. The purpose of a foreign key constraint after all is to ensure that RI *cannot* be violated. Maybe you mean that you may have need of some key value that isn't in the table being referenced, however that situation is exactly the same whatever key you use. Similarly, you would create a constraint on the natural key because ensuring unique emails (or whatever) is important to your data model and your business - for example to be a unique login name for user accounts. – nvogel Feb 11 '11 at 06:54
Unfortunately, in the real world, "natural keys" violate the very constraint you require. They are neither unique nor persistent. – Dour High Arch Feb 11 '11 at 19:19
You can use in the database the same identification scheme you use in reality. Otherwise your database will be an inaccurate representation of that reality. In reality different things are always uniquely identifiable. In any case, what you said was that "real" keys violate RI which is certainly untrue and that was what I wanted to point out. – nvogel Feb 12 '11 at 09:22

Setting up a basic table, is integer auto-increment primary key a standard?

7 Answers7