5

According to the documentation (emphasis mine):

TEXT[(M)] [CHARACTER SET charset_name] [COLLATE collation_name]

A TEXT column with a maximum length of 65,535 (216 − 1) characters. The effective maximum length is less if the value contains multibyte characters. Each TEXT value is stored using a 2-byte length prefix that indicates the number of bytes in the value.

Would it be more accurate to say that a TEXT column can store 65535 bytes? What is the specific impact of multibyte characters in a TEXT column?

Here's the source of my confusion:

In MySQL 5, CHAR and VARCHAR fields were changed so that they count characters instead of bytes (e.g., you can fit "你好,世界!" into a VARCHAR(6)). Did TEXT fields get the same treatment, or do they still count bytes?

todofixthis
  • 1,072
  • 1
  • 12
  • 25

1 Answers1

10

My knowledge: a character in utf-8 is max 32 Bit large (4-Byte).

Edit: utf8 is only max 3-Byte large in mysql. utf8mb4 is max 4-Byte large.

So the worst case with only the largest characters:

utf8: 65535 / 3 = 21845
utf8mb4: 65535 / 4 = 16383,75 =~ 16383

https://stackoverflow.com/a/9533324/2575671

Edit2:

I tested local with 10.1.21-MariaDB. Test characters utf-8:

1-Byte: a

2-Byte: ö

3-Byte: 好

4-Byte:

utf8: 21845 @3-Byte (好)
utf8mb4: 16386  @4-Byte ()

Screenshot:

local test

https://i.stack.imgur.com/5xgvL.png

Community
  • 1
  • 1
Steffen Mächtel
  • 981
  • 8
  • 13
  • 1
    In MySQL, for whatever reason `utf8` encoding only seems to allow for 3-byte characters. For 4 you need to use `utf8mb4`. – tadman Apr 25 '17 at 19:26
  • Probably worth doing a test, too, to see if those numbers are correct. I haven't really exercised those limits. MySQL seems confused on some of these issues. – tadman Apr 25 '17 at 19:40
  • 1
    the "whatever reason" is a tradeoff between character space and search performance, because MySQL uses `COLLATION`s for comparison, so utf-8 strings cannot easily be compared "bit by bit". (at least by default, unless you explicitly specify a `binary` comparison) .. reducing the character space by one byte significantly improves search performance. – Kaii Apr 25 '17 at 19:40
  • I'm having some difficulty grokking this answer. I *think* the idea here is to demonstrate that the limit of a `TEXT` column is actually 65535 _bytes_, but there's nothing here that actually shows that. I think the answer could be improved by demonstrating what happens when trying to store a value > 65535 bytes but < 65535 characters. Also, showing the `CREATE TABLE` for `test_mb4` would be helpful. – todofixthis Apr 28 '17 at 21:37
  • Also, is it possible to demonstrate this using MySQL? MariaDB may be intended to be a drop-in replacement for MySQL, but that doesn't mean it works exactly the same (otherwise, there'd be no point to using it!). – todofixthis Apr 28 '17 at 21:38
  • 1
    The Insert Queries for the test had more than 65535 Characters (100.000). The texts were cut off. The Statement: CREATE TABLE `test_mb4` (`description` text CHARACTER SET utf8mb4 NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; – Steffen Mächtel May 04 '17 at 22:02