VARCHAR as foreign key/primary key in database good or bad?

Question

Is it better if I use ID nr:s instead of VARCHARS as foreign keys? And is it better to use ID nr:s isntead of VARCHARS as Primary Keys? By ID nr I mean INT!

This is what I have now:

category table:
cat_id ( INT ) (PK)
cat_name (VARCHAR)

category options table:
option_id ( INT ) (PK)
car_id ( INT ) (FK)
option_name ( VARCHAR )

I COULD HAVE THIS I THINK:

category table:
cat_name (VARCHAR) (PK)

category options table:
cat_name ( VARCHAR ) (FK)
option_name ( VARCHAR ) ( PK )

Or am I thinking completely wrong here?

Good decision. VARCHAR could be used to support an inherited database, or legacy code. It's also more difficult ( though definitely not impossible ) to increment letters. — Armstrongest, Jan 20 '10 at 17:19
In second case, you can just drop the category table. Doesn't bring anything to the table, unless you mean to extend it in future. — Rok Kralj, Mar 02 '13 at 16:54

score 41 · Accepted Answer · answered Jan 20 '10 at 17:14

41

The problem with VARCHAR being used for any KEY is that they can hold WHITE SPACE. White space consists of ANY non-screen-readable character, like spaces tabs, carriage returns etc. Using a VARCHAR as a key can make your life difficult when you start to hunt down why tables aren't returning records with extra spaces at the end of their keys.

Sure, you CAN use VARCHAR, but you do have to be very careful with the input and output. They also take up more space and are likely slower when doing a Queries.

Integer types have a small list of 10 characters that are valid, 0,1,2,3,4,5,6,7,8,9. They are a much better solution to use as keys.

You could always use an integer-based key and use VARCHAR as a UNIQUE value if you wanted to have the advantages of faster lookups.

answered Jan 20 '10 at 17:14

Armstrongest

15,181
13
67
106

9

But the problem with numeric identifiers is they are hard to read and you always need to join to other tables to obtain their meaning when looking at your data. Imagine a table that relates three (or more) other tables with all IDs as integers? You have three columns of numbers which have no immediate meaning. – Tony Jan 20 '10 at 17:20
13

@Tony: Immediately readable data typically means denormalized data. – OMG Ponies Jan 20 '10 at 17:28
2

@Tony. I know what you mean, however, that's what front-ends are for. They are for reading the data. The Database Structure should probably be created for efficiency, ease of maintenance, robustness, and scalability. Readability can be done with some Quick Views if you need it while developing. While it can be tempting to give every key a readable name, in any sizeable database, you will end up with problems trying to make every key readable. Not everything that CAN be done, should be done. – Armstrongest Jan 20 '10 at 17:34
4

@OMG Ponies: I take your point, but it's not always an immediate indication of denormalized data. In the example @astander gave (Stock Exchange company codes) it makes perfect sense to use a CHAR key for the company. They are all unique but also quite readable. – Tony Jan 20 '10 at 17:40
2

@Atomiton: Yes, front ends are there to make things easier but can you honestly say you have done all your database problem solving through a front end other than the one provided by the database? :) – Tony Jan 20 '10 at 17:43
@Tony: I said *typically*, not *always*. Using a VARCHAR should be because there is a natural key, while an integer would be an artificial key because as you pointed out - there's no implicit relationship between the number and the data. – OMG Ponies Jan 20 '10 at 18:16
@Tony: Why would you look at the key columns? You know they exist, and you know you can join on them. My limited experience is that pure join tables like you've mentioned are rare, and it's easy to write a query to display them in a more understandable form. – David Thornley Jan 20 '10 at 18:19
@OMG Ponies: I had noticed I'd missed the word "typically", sorry about that. I tried to edit my comment but it's locked now. – Tony Jan 20 '10 at 18:33
1

@David Thornley: You look at them because they identify your data. If you were looking for info using a company stock code (to continue the example) why not use the code directly, rather than having to look up a number to then use that to find the data you actually want? As I predicted this is rapidly becoming a discussion about natural vs surrogate keys and it's all my fault... :) – Tony Jan 20 '10 at 18:38
@Tony: No worries. My main point was, beyond what's already provided, why VARCHAR vs Int would be chosen when data modelling. – OMG Ponies Jan 20 '10 at 18:49
@Tony: It's not your fault; the whole question is tied up with natural vs. surrogate keys. For a natural key, you use VARCHAR when approriate. For a surrogate key, you'd always use an integer or something. I just seem to be more on the surrogate-key side than you are. – David Thornley Jan 20 '10 at 19:04
1

@All. Sure, a company stock code, for example, would make a great readable unique primary key. After all, it should ALWAYS be a unique way to represent a product, right? However, sometimes we can't predict how our data will be used in the future. One scenario ( there are many ) is this: You discontinue an old product and ( accidentally? ) reuse the stock code in the future. You now have no [easy] way to isolate the old from the new data. Note that the point isn't the example, it's that allowing the computer to automate the selection of unique keys future proofs your data better. – Armstrongest Jan 25 '10 at 17:19
4

Nothing random here. Stock code change is not unheard of (SUN -> JAVA). Same company, how to deal with this? Unique - NO. Not the moment you get international. Or use not only stocks but also other financial instruments. Stock code - which one? There are multiple symbologies (reuters etc.) with standardized (but different) codes. – TomTom Jan 21 '13 at 08:15
3

-1. Your second-to-last sentence is either incorrect, or **very** misleading. The 'characters' in an integer type have no meaning to the database whatsoever, it is simply a visual representation of a byte value. An INT is just as fast as a VARCHAR(4), and a BIGINT is just as fast as a VARCHAR(8) (assuming all the characters are one byte in length). – NobleUplift Jun 12 '14 at 15:07
-1 You will never get a join mismatch due to whitespace differences. That's what foreign keys are designed to prevent, which is what this question is all about. At worst, you'll get a constraint violation, and perhaps have to scratch your head over why it isn't allowing a value that seems to match one that's present in the referenced table and column. But any sane application will constraint inserted values to be from the referenced table and column in the first place (e.g., populate a drop-down list for the user to choose from), so this normally won't be a problem. – Marcelo Cantos Jul 09 '14 at 03:20

score 18 · Answer 2 · answered Oct 20 '12 at 22:25

18

My 2 cents:

From a performance perspective, using CHAR or VARCHAR as primary key or index is a nightmare.

I've tested compound primary keys (INT + CHAR, INT + VARCHAR, INT + INT) and by far INT + INT was the best performance (loading a data warehouse). Lets say about twice more performance if you keep only numeric primary keys/indexes.

answered Oct 20 '12 at 22:25

jrvidotti

520
1
6
12

How did you do these tests? – v1d3rm3 Mar 13 '23 at 15:36

score 15 · Answer 3 · answered Jan 20 '10 at 17:48

When I'm doing design work I ask myself: have I got anything in this data that I can guarantee is going to be non-NULL, unique, and unchanging? If so that's a candidate to be the primary key. If not, I know I have to generate a key value to use. Assuming, then, that my candidate key happens to be a VARCHAR I then look at the data. Is it reasonably short in length (meaning, say, 20 characters or less)? Or is the VARCHAR field rather long? If it's short it's usable as a key - if it's long, perhaps it's better to not use it as a key (although if it's in consideration for being the primary key I'm probably going to have to index it anyways). At least part of my concern is that the primary key is going to have to be indexed and will perhaps be used as a foreign key from some other table. Comparisons of VARCHAR fields tend to be slower than the comparison of numeric fields (particularly binary numeric fields such as integers) so using a long VARCHAR field as a key may result in slow performance. YMMV.

Adriaan Stander · Answer 4 · 2010-01-20T17:17:51.930

4

I would say it is fine to use VARCHAR as both PRIMARY and FOREIGN KEYS.

Only issue I could forsee is if you have a table, lets say Instruments (share instruments) and you create the PRIMARY/FOREIGN KEY as VARCHAR, and it happens that the CODE changes.

This does happen on Stock Exchanges, and would require you to rename all references to this CODE, where as a ID nr would not require this from you.

So to conclude, I would say this dependes on your intended use.

EDIT

When I say CODE, I mean the Ticker Code for lets say GOOG, or any other share. It is possible for these codes to change over time, lets say you look at Dirivative/Future instruments.

edited Jan 20 '10 at 17:17

answered Jan 20 '10 at 17:12

Adriaan Stander

162,879
31
289
284

Thanks for the answer, please explain "CODE"... check this Q out to find out my intended use please: http://stackoverflow.com/questions/2100008/can-this-mysql-db-be-improved-or-is-it-good-as-it-is – Jan 20 '10 at 17:16
1

Even if a key does change it shouldn't be a problem if your database is correctly set up to cascade changes. – Tony Jan 20 '10 at 17:16
1

@Tony I don't think I've ever come across a situation where I would want to cascade a change like that. In fact, I've never used a database where we've had cascade switched on :-) – Matthew Farwell Jan 20 '10 at 17:22
1

@MatthieuF: True, keys (hopefully!) don't change but I guess you might have to deal with it in one way or another if they did. [putting my neck on the line here] I prefer to use natural keys, hence I've had to think about this possibility :) – Tony Jan 20 '10 at 17:36

score 4 · Answer 5 · answered Jan 20 '10 at 17:15

4

with an int you can store up to 2 billion in 4 bytes with varchars you cannot you need to have 10 bytes or so to store that, if you use varchars there is also a 2 byte overhead

so now you add up the 6 extra bytes in every PK and FK + the 2 byte varchar overhead

answered Jan 20 '10 at 17:15

SQLMenace

132,095
25
206
225

Yeah, I mean, when it comes down to it, each character in a VARCHAR is a binary number to a processor. It just has to do the work of translating it using an ASCII lookup table for each letter. "A" == 65 == 00000000 01000001. That's stored for each letter. In fact, if you're using words as Keys, you'd run out of English words LONG before you used the possible number combinations in just TWO BYTES. – Armstrongest Jan 20 '10 at 17:24
I don't think the speed of queries is particularly affected by the use of VARCHAR. A database I work on uses CHAR primary keys on some tables with over 4 million records and queries execute in under a second. IU don't think a surrogate integer key would make a lot of difference. – Tony Jan 20 '10 at 17:29
@Tony char doesn't have the 2 byte overhead of a varchar also my tables are in the billion row ranges so 2 bytes extra per key is 7 GB extra I have to pay for SAN space just for one table...then backups take longer etc etc when you have large table you need to optimize a lot more than with a couple of million rows – SQLMenace Jan 20 '10 at 17:35
Fair point. The database I have at the moment is the largest I've worked with so yes, I can see those extra couple of bytes would cause a problem. Also, thinking about it I don't have any VARCHAR keys, only CHAR! :) – Tony Jan 20 '10 at 17:47
1

Overhead is still overhead. It may not be that significant, but unless there's a good reason, it still isn't a great idea, unless there's a good business reason. – Armstrongest Jan 25 '10 at 17:17

score 4 · Answer 6 · answered Jan 20 '10 at 17:41

4

If you make the category name into the ID you will have a problem if you ever decide to rename a category.

answered Jan 20 '10 at 17:41

SorcyCat

1,206
10
19

score 2 · Answer 7 · answered Jan 20 '10 at 17:26

2

There's nothing wrong with either approach, although this question might start the usual argument of which is better: natural or surrogate keys.

If you use CHAR or VARCHAR as a primary key you'll end up using it as a forign key at some point. When it comes down to it, as @astander says, it depends on your data and how you are going to use it.

answered Jan 20 '10 at 17:26

Tony

9,672
3
47
75

Lol. Sometimes I wonder if, in the future, wars will be started over issues like these. You're right, though, it always depends on the data and how it will be used. Personally, I've been burned too many times using non-integer types as a PK, so I put up with the "non-human-readableness" of PKs that don't relate directly to the record. – Armstrongest Jan 25 '10 at 17:26

VARCHAR as foreign key/primary key in database good or bad?

7 Answers7

Linked