140

What is the reasoning behind setting latin1_swedish_ci as the compiled default when other options seem much more reasonable, like latin1_general_ci or utf8_general_ci?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alan
  • 2,897
  • 4
  • 23
  • 27
  • 7
    Possible duplicate of [Why does MySQL use latin1\_swedish\_ci as the default?](http://stackoverflow.com/questions/3936059/why-does-mysql-use-latin1-swedish-ci-as-the-default) – syrkull Feb 10 '16 at 05:09
  • 1
    Please note that `utf8_general_ci` does not support 4-byte UTF-8 so for true UTF-8 support you would want `utf8mb4_general_ci` or one of the other `mb4` variants. – ColinM Sep 25 '18 at 21:30

2 Answers2

137

The bloke who wrote it was co-head of a Swedish company.

Possibly for similar reasons, Microsoft SQL Server's default language us_english.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
gbn
  • 422,506
  • 82
  • 585
  • 676
  • 7
    He is Finnish , but Finnish and Swedish share almost the same special characters ,so they share the same case insensitive collation – kommradHomer Feb 26 '14 at 10:47
  • 9
    Talking about 'good defaults'. Which this, of course, is not. Great to see that after what, 20 years? they changed this into a sane default, like ```utf8_general_ci```. Good job, MySQL ! – Michahell Sep 24 '15 at 10:17
  • 5
    Yes you are right, He named MariDB (Wife name is Maria) and MaxDB (His son name is Max). but why he left his Daughter name..! :) LOL. ! – Ajmal PraveeN Jan 08 '18 at 09:06
  • @AjmalPraveen Monty named his database projects in chronological order after his kids; My, Max and Maria. – VexingParse Feb 08 '22 at 14:54
  • @VexingParse Oh, I see.. – Ajmal PraveeN Feb 10 '22 at 00:59
  • @MichaelTrouw the latin1 charset can be a good default as it is far smaller than utf8. So if your field is a username which can only be a-z and 0-9, I see no reason why all sorts of characters (to name a few: emoji, Ethiopic syllables, and many more) should be acceptable at the cost of resources. – undefined Oct 15 '22 at 11:54
  • well, yes, but don't design your DB around the requirements for just a username :) – Michahell Oct 20 '22 at 09:19
105

latin1_swedish_ci is a single byte character set, unlike utf8_general_ci.

Compared to latin1_general_ci it has support for a variety of extra characters used in European languages. So it’s a best choice if you don’t know what language you will be using, if you are constrained to use only single byte character sets.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Ariel
  • 25,995
  • 5
  • 59
  • 69
  • 42
    I like this answer because it tries to objectively justify the choice of latin swedish. However, the accepted answer seems a more plausible explanation, from a social perspective, for why swedish was chosen in particular. – Alan Jul 21 '11 at 19:30
  • 3
    It's certainly possible that this was the author's reasoning, and just a coincidence that he's Swedish. It seems reasonable that a Swede would want (and know) to support additional European characters. – Matt Jan 28 '14 at 20:11
  • 3
    -1 The accepted answer could be just an opinion but it is 100 times more reasonable than this answer. Also , you can see that "the bloke who wrote it" also named MariaDB after his daugther and maxDB after his son. – kommradHomer Feb 26 '14 at 10:35
  • 2
    "latin1_general_ci it has support for a variety of extra characters used in European languages" - Just to make this clear, utf8_general_ci, unlike utf8_unicode, does have a wide support for European languages specific chars. I don't see an advantage over "latin1_swedish_ci". Or am I wrong? – MEM Jul 01 '15 at 11:52
  • For example, CHAR(2) latin1 uses 2 bytes, CHAR(2) utf8mb4 (which is full utf8) uses 8 bytes. I use latin1 to store 2-digit country codes because there will never be non-european characters – the_nuts Jan 06 '17 at 21:19