2

When I create a table, even if it's using integers only, I set the default charset to utf8 (because I copy paste the code and because in case I introduce a string column in the future).

Example:

CREATE TABLE IF NOT EXISTS `articles` (
  `id` smallint(6) unsigned NOT NULL,
  `disabled` tinyint(1) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

However, i'm wondering if it's affecting "performances" to have a default charset in a table that do not make use of it.

Barbz_YHOOL
  • 589
  • 4
  • 15
  • [premature optimization is the root of all evil](https://stackify.com/premature-optimization-evil/#:~:text=Here%20is%20the%20full%20quote,of%20it). What are you trying to win, in time, or in resources used, if you leave out the `CHARSET=utf8` ? – Luuk Feb 25 '21 at 18:27
  • I have no idea, I'm asking for information from experts – Barbz_YHOOL Feb 25 '21 at 18:28
  • `smallint` is definitely a premature optimization you will regret. IDs get used up fast, and 32767 is smaller than you think. Use [`serial`](https://dev.mysql.com/doc/refman/8.0/en/numeric-type-syntax.html). – Schwern Feb 25 '21 at 18:47
  • I don't think it's premature because I know it will never have more than 2000 IDs used :P – Barbz_YHOOL Feb 25 '21 at 18:51
  • 1
    When predicting the future, there's only one prediction you can rely on; that your prediction is wrong. [Defensive programming](https://en.wikipedia.org/wiki/Defensive_programming) is the craft of preparing your code for the unexpected, like ID exhaustion. To save 6000 bytes, a meaningless amount, you're risking a mysterious failure if you hit ID 32768. Side note, your ID is not auto-incremented. It should be. – Schwern Feb 25 '21 at 19:03
  • it was just an example query, not the real one :P mine is auto incremented, thank you for analyzing it in depth though!! – Barbz_YHOOL Feb 25 '21 at 19:05

2 Answers2

2

Tables have a character set no matter what, so no, there's no performance issue, and UTF-8 is a good default choice (but utf8 is not). But you still shouldn't do that.

It is a bad practice to add a default character set unless you need to specify one. This overrides the default character set of the database which might not be utf8. You're risking making a table with a different character set than every other table causing confusion.

Instead, make sure the server and database character set are set correctly. Then let your tables use the default, unless you have a specific reason to do otherwise.

For example, UTF-8 is a good default choice, but MySQL got UTF-8 wrong. utf8 cannot handle all of UTF-8. You should instead be using utf8mb4 (UTF-8 4-byte). The database might correctly use utf8mb4, but you're overriding that with a less capable character set.

See Specifying Character Sets and Collations and Unicode Support.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • today mysql uses utf8 as synonym for utf8mb4 – nbk Feb 25 '21 at 18:47
  • Thank you. I'm not on mysql 8 yet and also this sql might be shared to other people, so I think setting the default charset is wanted in that case (because people could have another charset as default) – Barbz_YHOOL Feb 25 '21 at 18:48
  • @nbk ["*utf8 is an alias for the utf8mb3 character set.*"](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html) They might change it in the future. We don't know what version they're using. – Schwern Feb 25 '21 at 18:49
  • 1
    @Barbz_YHOOL That's exactly the confusion I'm referring to. You want to use their default. You don't want, for example, your table using utf8mb3 and the rest using latin1. It's much better to use their database's character set, that's what they will be encoding their strings as. – Schwern Feb 25 '21 at 18:54
  • ok when the user inserts his own strings. But what if the table already contains a set of data with strings that are in english? And what if it's both case in one ? (in that case i suppose it's better to let them use their default charset anyway) – Barbz_YHOOL Feb 25 '21 at 19:01
  • 1
    @Barbz_YHOOL Importing and exporting tables and data is another issue. "*contains a set of data with strings that are in english*" Language does not define the character set; english can be stored in many different character sets. While it's gotten much easier with UTF-8 becoming the default, character sets and encodings are always complicated and must be done with consideration. Trying to find a blanket solution for all situations often makes the problem worse. – Schwern Feb 25 '21 at 19:09
1

The DEFAULT CHARSET clause at the bottom of your table creation is only metadata. It is only used if you add a CHAR/VARCHAR/TEXT column and don't explicitly define the column's character set. Then the table's default character set is used.

Tables don't have any performance characteristic — they are just storage. Queries have performance.

Since your table has no columns with character sets, there can be no query against this table that is affected by the character set. Therefore the default character set has no effect.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828