Here's my languages table definition with an auto-increment column (the DBMS is MySQL):
DROP TABLE IF EXISTS languages;
CREATE TABLE IF NOT EXISTS languages (
language VARCHAR(16) NOT NULL,
PRIMARY KEY (language)
) ENGINE=InnoDB;
Here's another version of it, but with the UNIQUE constraint applied to it:
DROP TABLE IF EXISTS languages;
CREATE TABLE IF NOT EXISTS languages (
language_id TINYINT NOT NULL AUTO_INCREMENT,
language VARCHAR(16) NOT NULL,
PRIMARY KEY (language_id),
UNIQUE (language)
) ENGINE=InnoDB;
I'm kind of in two minds about which version is better to use. On one hand, the first definition seems to be the correct one according to database design theory, simply because there is no extra junk in it and the PRIMARY KEY constraint guarantees that there cannot be two rows with the same value, that is, there is no way that, for instance, the word 'English' can appear twice in the column, which is, of course, a good thing. But the problem with this is that the foreign key field in another table that references the language column has to store strings instead of ID numbers. That simply means the referencing table will be storing the whole thing in the column and if the application can provide a drop-down combobox list with pre-populated unique values in it, having the languages table seems to have no point. But, theoretically speaking, the first way is still more correct.
On the other hand, the second approach sounds more practical. For ensuring uniqueness we can use the UNIQUE constraint and we have integers instead of strings in the referencing columns which tend to take up less memory and, as far as I know, they are much faster during searching operations than strings.
Please, help me get this straight.