I recommend:
CREATE TABLE ...
ISO_630_3 CHAR(3) CHARACTER SET ascii
That will be 3 bytes, which is smaller than INT
(4 bytes)and not much bigger than
SMALLINT UNSIGNED` (2 bytes).
(Am I correct in saying that the codes are always 3 ascii letters? Hence no need for VAR
, which takes an extra byte or two.)
CHAR(3)
is readily indexable. There is no significant advantage in 'normalizing' even to smallint. This still applies even at the scale of a billion rows.
And, as you point out, "easier to read" is worth something.
If you are also storing text, I assume that all such text can be mapped to UTF-8? If so, use
my_text TEXT CHARACTER SET utf8mb4
In MySQL, there is no problem having different columns in a single table using different charsets (or collations).
Perhaps worth noting... Many languages can be discovered from the hex utf-8 encoding:
⚈ Cxyy -- More Western Europe: Latin (C3-CA), Combining Diacritical Marks (CC-CD), Greek (CE-CF)
⚈ Dxyy -- Cyrillic (D0-D4), Hebrew (D6-D7), Arabic/Persian/Farsi (D8-DB), etc
⚈ E0yyyy -- various Indian character sets, southern Asia, etc.
⚈ E1yyyy -- Cherokee, Balinese, Khmer, Mongolian, Vietnamese, etc.
(etc)
-- http://mysql.rjweb.org/doc.php/charcoll#diagnosing_charset_issues