Why is MySQL's default collation latin1_swedish_ci?

Question

What is the reasoning behind setting latin1_swedish_ci as the compiled default when other options seem much more reasonable, like latin1_general_ci or utf8_general_ci?

Possible duplicate of [Why does MySQL use latin1\_swedish\_ci as the default?](http://stackoverflow.com/questions/3936059/why-does-mysql-use-latin1-swedish-ci-as-the-default) — syrkull, Feb 10 '16 at 05:09
Please note that `utf8_general_ci` does not support 4-byte UTF-8 so for true UTF-8 support you would want `utf8mb4_general_ci` or one of the other `mb4` variants. — ColinM, Sep 25 '18 at 21:30

score 137 · Accepted Answer · edited Sep 25 '15 at 17:27

137

The bloke who wrote it was co-head of a Swedish company.

Possibly for similar reasons, Microsoft SQL Server's default language us_english.

edited Sep 25 '15 at 17:27

Giacomo1968

25,759
11
71
103

answered Jul 21 '11 at 06:11

gbn

422,506
82
585
676

7

He is Finnish , but Finnish and Swedish share almost the same special characters ,so they share the same case insensitive collation – kommradHomer Feb 26 '14 at 10:47
9

Talking about 'good defaults'. Which this, of course, is not. Great to see that after what, 20 years? they changed this into a sane default, like ```utf8_general_ci```. Good job, MySQL ! – Michahell Sep 24 '15 at 10:17
5

Yes you are right, He named MariDB (Wife name is Maria) and MaxDB (His son name is Max). but why he left his Daughter name..! :) LOL. ! – Ajmal PraveeN Jan 08 '18 at 09:06
@AjmalPraveen Monty named his database projects in chronological order after his kids; My, Max and Maria. – VexingParse Feb 08 '22 at 14:54
@VexingParse Oh, I see.. – Ajmal PraveeN Feb 10 '22 at 00:59
@MichaelTrouw the latin1 charset can be a good default as it is far smaller than utf8. So if your field is a username which can only be a-z and 0-9, I see no reason why all sorts of characters (to name a few: emoji, Ethiopic syllables, and many more) should be acceptable at the cost of resources. – undefined Oct 15 '22 at 11:54
well, yes, but don't design your DB around the requirements for just a username :) – Michahell Oct 20 '22 at 09:19

score 105 · Answer 2 · edited Sep 25 '15 at 17:56

105

latin1_swedish_ci is a single byte character set, unlike utf8_general_ci.

Compared to latin1_general_ci it has support for a variety of extra characters used in European languages. So it’s a best choice if you don’t know what language you will be using, if you are constrained to use only single byte character sets.

edited Sep 25 '15 at 17:56

Giacomo1968

25,759
11
71
103

answered Jul 21 '11 at 00:22

Ariel

25,995
5
59
69

42

I like this answer because it tries to objectively justify the choice of latin swedish. However, the accepted answer seems a more plausible explanation, from a social perspective, for why swedish was chosen in particular. – Alan Jul 21 '11 at 19:30
3

It's certainly possible that this was the author's reasoning, and just a coincidence that he's Swedish. It seems reasonable that a Swede would want (and know) to support additional European characters. – Matt Jan 28 '14 at 20:11
3

-1 The accepted answer could be just an opinion but it is 100 times more reasonable than this answer. Also , you can see that "the bloke who wrote it" also named MariaDB after his daugther and maxDB after his son. – kommradHomer Feb 26 '14 at 10:35
2

"latin1_general_ci it has support for a variety of extra characters used in European languages" - Just to make this clear, utf8_general_ci, unlike utf8_unicode, does have a wide support for European languages specific chars. I don't see an advantage over "latin1_swedish_ci". Or am I wrong? – MEM Jul 01 '15 at 11:52
For example, CHAR(2) latin1 uses 2 bytes, CHAR(2) utf8mb4 (which is full utf8) uses 8 bytes. I use latin1 to store 2-digit country codes because there will never be non-european characters – the_nuts Jan 06 '17 at 21:19

Why is MySQL's default collation latin1_swedish_ci?

2 Answers2

Linked