927

Per the MySQL docs, there are four TEXT types:

  1. TINYTEXT
  2. TEXT
  3. MEDIUMTEXT
  4. LONGTEXT

What is the maximum length that I can store in a column of each data type assuming the character encoding is UTF-8?

Lalith B
  • 11,843
  • 6
  • 29
  • 47
  • 38
    Take for example the TEXT type. It can contain 65535 *bytes* of data. UTF-8 contains multi-byte characters. Therefore, if you filled the field using only the danish character "Ø", you would only get 32767 characters, as that UTF-8 character is composed of two bytes. If you filled it with "a", you would get 65535 characters. – Andrew Plank Sep 18 '13 at 10:36
  • 3
    Also consider reading [Which DATATYPE is better to use TEXT or VARCHAR](http://stackoverflow.com/questions/1203710/which-datatype-is-better-to-use-text-or-varchar) – Somnath Muluk Aug 23 '16 at 10:48

4 Answers4

1740

From the documentation (MySQL 8) :

      Type | Maximum length
-----------+-------------------------------------
  TINYTEXT |           255 (2 8−1) bytes
      TEXT |        65,535 (216−1) bytes = 64 KiB
MEDIUMTEXT |    16,777,215 (224−1) bytes = 16 MiB
  LONGTEXT | 4,294,967,295 (232−1) bytes =  4 GiB

Note that the number of characters that can be stored in your column will depend on the character encoding.

Déjà vu
  • 28,223
  • 6
  • 72
  • 100
Bridge
  • 29,818
  • 9
  • 60
  • 82
  • 3
    @Bridge Not sure I understand, but this means that TINYTEXT can get up to 255 characters, am I right??? – ltdev May 27 '14 at 13:19
  • 10
    @Lykos Yes, well - depending on the characters. From the documentation: `A TEXT column with a maximum length of 255 (28 – 1) characters. The effective maximum length is less if the value contains multi-byte characters.` See Ankan's answer for more detail. – Bridge May 27 '14 at 14:05
  • 5
    @aurel.g This is how you really answer question. And I agree with Christophe, this is how mySQL should present its parameters--even if just as a supplementary shorthand to their...arcane text view. – cbmtrx Nov 04 '15 at 18:16
  • 1
    It might be worth adding that the order of magnitude of a character is a couple of bytes (min. 1 I suppose). So one might store 10,000-50,000 characters in a TEXT column, ... – Vince Jan 29 '16 at 21:26
  • 2
    @GaborSch So you're saying the documentation is wrong? I'm afraid I have just written a test which inserts 65535 characters into a TEXT column with no problem. – Bridge Oct 21 '16 at 12:45
  • 1
    @Bridge 65535 ASCII characters (which weight 1 byte each). What he meant is a rough average of what you would be able to do when storing real-life text which could contain multibyte characters – Mouradif Apr 29 '18 at 00:16
  • 1
    Hi everyone, can anyone please tell me, how the memory is allotted. For example, if i'm using LONGTEXT, is that 4GB is allotted by default or based on the value we are inserting, it will increase until that maximum size (4GB)?? – abubakkar Aug 13 '18 at 07:02
  • 123
    Why is it harder to find this in the docs than in stackoverflow – Boris D. Teoharov Sep 11 '18 at 11:52
  • 2
    @BorisD.Teoharov It seems to be the way MySQL names their doc pages, making it harder to find if not specifying the exact required words. The new doc is here https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html – Déjà vu Jun 14 '20 at 08:07
  • 1
    @Bridge Does KiB mean KB (kilobyte or kibibyte)? Does MiB mean MB (megabyte)? Does GiB mean GB (gigabyte)? https://en.wikipedia.org/wiki/kilobyte vs https://en.wikipedia.org/wiki/Kibibyte – 1.21 gigawatts Aug 10 '20 at 12:09
  • @1.21gigawatts Much of the time both symbols are used interchangably, but technically it's the latter. – Bridge Aug 11 '20 at 14:40
287

Expansion of the same answer

  1. This SO post outlines in detail the overheads and storage mechanisms.
  2. As noted from point (1), A VARCHAR should always be used instead of TINYTEXT. However, when using VARCHAR, the max rowsize should not exceeed 65535 bytes.
  3. As outlined here http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-utf8.html, max 3 bytes for utf-8.

THIS IS A ROUGH ESTIMATION TABLE FOR QUICK DECISIONS!

  1. So the worst case assumptions (3 bytes per utf-8 char) to best case (1 byte per utf-8 char)
  2. Assuming the english language has an average of 4.5 letters per word
  3. x is the number of bytes allocated

x-x

      Type | A= worst case (x/3) | B = best case (x) | words estimate (A/4.5) - (B/4.5)
-----------+---------------------------------------------------------------------------
  TINYTEXT |              85     | 255               | 18 - 56
      TEXT |          21,845     | 65,535            | 4,854.44 - 14,563.33  
MEDIUMTEXT |       5,592,415     | 16,777,215        | 1,242,758.8 - 3,728,270
  LONGTEXT |   1,431,655,765     | 4,294,967,295     | 318,145,725.5 - 954,437,176.6

Please refer to Chris V's answer as well : https://stackoverflow.com/a/35785869/1881812

oldboy
  • 5,729
  • 6
  • 38
  • 86
Ankan-Zerob
  • 3,378
  • 2
  • 18
  • 25
  • 4
    What is the rationale for this "A VARCHAR should always be used instead of TINYTEXT"? Wouldn't it be better (because more storage efficient) to use the smaller TINYTEXT sometimes? – vlasits Apr 21 '14 at 20:46
  • 27
    @vlasits read the included SO post for details. (1) all text types, including tinytext are stored as objects outside the row which is one overhead (2) These objects are then referenced by addresses 8 or 16 bytes. so no matter how tiny your tinytext is, you are adding unnecessary overheads, that too for a max size of 255 bytes. it is clear that varchar should be used, which wont have any of the above overheads. – Ankan-Zerob Apr 22 '14 at 14:08
  • 4
    @Ankan-Zerob Given that it appears very clear that TINYTEXT should never be used over VARCHAR, what is the rationale for even having it as an option? Is there some obscure use-case where it is necessary? – nextgentech May 25 '15 at 20:15
  • 5
    @nextgentech Have a look at https://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html. A record size is limited to 64 KiB. A table is limited to 4k columns. A `TINYTEXT` counts 1 byte + 8 byte against the record size, whereas a `VARCHAR(255)` counts from 1 byte + 255 byte up to 2 byte + 1020 byte (4 byte UTF-8 characters) against the record size. – Shi May 31 '15 at 00:37
  • 2
    I like expressing field sizes in words, but... English is normally considered to have around 5 characters per word, and there is also a space character to be stored; however, English will always be close to 1 byte per UTF-8 character, so I would divide by 6 giving around 40 / 10,000 / 2,700,000 / 710,000,000 words for the different sizes. Languages with lots of accents such as Polish would have slightly fewer words; Greek, Hebrew, Arabic, etc (with mostly 2-byte sequences) about half; CJK ideographs are 3 or 4-byte sequences, but I don't know how long words are. – ChrisV Feb 29 '16 at 19:05
  • @Ankan-Zerob With character sets like `utf8mb4` using a maximum of four bytes per character, wouldn't the worst case character count be x/4? – Vanlalhriata Jan 19 '17 at 10:17
  • @Vanlalhriata yes good catch there. however, those characters are outside the Basic Multilingual Plane [https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane], including the english dictionary that we are taking into consideration. if you have emoticons and special chinese characters, do factor it in :) – Ankan-Zerob Jan 19 '17 at 11:53
  • Wouldn't you need to increase the worst-case to account for the space character inbetween each average word length? Should be 5.5 chars per word, not 4.5. – Jon Jun 25 '21 at 21:45
  • *"max 3 bytes for utf-8"* Do note that this 'utf-8' is not, in fact, utf-8. It's MySQL's legacy/broken proprietary 3-byte encoding that only MySQL and MariaDB call utf-8 and the rest of the world calls 'what is this broken bs??'. The REAL utf-8 we all know and love is [max 4 bytes per character](https://stijndewitt.com/2014/08/09/max-bytes-in-a-utf-8-char) and is called `utf8mb4` in MySQL and MariaDB. If you are using MySQL < 8 or MariaDB and you are either not specifying the encoding or specifying utf-8, you are almost certainly doing it wrong. – Stijn de Witt Aug 17 '22 at 13:40
  • To read more about utf-8 vs utf8mb4, read my blog post on the topic: [Use Mysql utf8mb4 if you want full unicode support](https://stijndewitt.com/2015/06/15/use-mysql-utf8mb4-if-you-want-full-unicode-support/) – Stijn de Witt Aug 17 '22 at 13:53
56

Rising to @Ankan-Zerob's challenge, this is my estimate of the maximum length which can be stored in each text type measured in words:

      Type |         Bytes | English words | Multi-byte words
-----------+---------------+---------------+-----------------
  TINYTEXT |           255 |           ±44 |              ±23
      TEXT |        65,535 |       ±11,000 |           ±5,900
MEDIUMTEXT |    16,777,215 |    ±2,800,000 |       ±1,500,000
  LONGTEXT | 4,294,967,295 |  ±740,000,000 |     ±380,000,000

In English, 4.8 letters per word is probably a good average (eg norvig.com/mayzner.html), though word lengths will vary according to domain (e.g. spoken language vs. academic papers), so there's no point being too precise. English is mostly single-byte ASCII characters, with very occasional multi-byte characters, so close to one-byte-per-letter. An extra character has to be allowed for inter-word spaces, so I've rounded down from 5.8 bytes per word. Languages with lots of accents such as say Polish would store slightly fewer words, as would e.g. German with longer words.

Languages requiring multi-byte characters such as Greek, Arabic, Hebrew, Hindi, Thai, etc, etc typically require two bytes per character in UTF-8. Guessing wildly at 5 letters per word, I've rounded down from 11 bytes per word.

CJK scripts (Hanzi, Kanji, Hiragana, Katakana, etc) I know nothing of; I believe characters mostly require 3 bytes in UTF-8, and (with massive simplification) they might be considered to use around 2 characters per word, so they would be somewhere between the other two. (CJK scripts are likely to require less storage using UTF-16, depending).

This is of course ignoring storage overheads etc.

ChrisV
  • 8,748
  • 3
  • 48
  • 38
  • CJK characters may use 3 or 4 byte sequence: http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html – Raptor Oct 17 '16 at 08:10
13

This is nice but doesn't answer the question:

"A VARCHAR should always be used instead of TINYTEXT." Tinytext is useful if you have wide rows - since the data is stored off the record. There is a performance overhead, but it does have a use.

DCR
  • 14,737
  • 12
  • 52
  • 115
colin0117
  • 1,448
  • 1
  • 11
  • 15