1

As far as I understand, the only difference between the different TEXT column types in MySQL is the number of bytes necessary to store the length:

TINYTEXT    L + 1 bytes, where L < 2^8  
TEXT        L + 2 bytes, where L < 2^16
MEDIUMTEXT  L + 3 bytes, where L < 2^24
LONGTEXT    L + 4 bytes, where L < 2^32

So why isn't there just one column type with the maximum length and no necessity to store the length?

TEXT        L bytes, where L < 2^32
  • Have a look at this [post](http://stackoverflow.com/questions/7755629/varchar255-vs-tinytext-tinyblob-and-varchar65535-vs-blob-text) – 1000111 Apr 06 '16 at 11:46
  • @1000111 I understand the differences. I want to know *why*. –  Apr 06 '16 at 11:48

2 Answers2

1

There are historical reasons for this. MySQL was designed 20 years ago, using the ideas and the hardware and software limitations of that time. 32-bit hardware and Linux were something relatively new back in 1995. The usual amount of RAM installed in a desktop computer was somewhere around 4-8 MB. The servers had 16 or 32 MB.

The database design ideas it implemented were another 20 years old (or more), when the hardware and software limitations were even stronger (the computer programs used to run in several hundred kilobytes).

Back then, there were two different types to store strings in databases: CHAR and VARCHAR. They still exist now but the differences between them blur every day. The values of CHAR columns used to have a fixed length while VARCHAR used to have variable length, not exceeding a maximum length declared on column creation. (There are other differences too, regarding the handling of whitespace characters but they are not relevant for the discussion).

There are several reasons why fixed-length and variable-length string types were created. While variable-length seems better because it doesn't waste space, the data of fixed-length string columns can be stored in the same place as the data of other fixed-length data types (integers, dates etc). Having the same length for all the records of a table is very important at the low level because it provides constant time for finding and retrieving a row in the storage, given the row number (provided by an index). Variable-length strings cannot provide this functionality, that's why they were stored in a different place, using a different way to retrieve them. A different place means an additional disk read in order to get them into memory and this takes time. Both fixed-length and variable-length string types have advantages and disadvantages.

Regarding the sizes, back in the 80s (and before that) the RDBMSes used to implement only CHAR/VARCHAR (max. 255 bytes) and TEXT. For most database systems at that time, the maximum length for a TEXT field wasn't larger than 64KiB.

MySQL implemented the data types in use by the existing database systems in 1995 and extended them.

Due to the hardware improvements and the reduction of storage cost per GB in the last 20-25 years, the technical reasons that generated the existence of CHAR and VARCHAR string types and their various size flavours became less and less important. As a consequence, MySQL unified their internal handling and storage, as you remarked in their documentation.

axiac
  • 68,258
  • 9
  • 99
  • 134
  • Thank you. And the different types have to be maintained for backward compatibility, I guess. –  Apr 06 '16 at 15:40
0

For the same reason there are different number sizes (tinyint, smallint, etc.): the bigger each record is the more space it occupies, which consume hard drive space, makes searchs slower, etc., and when number when the number of records scalate this issues do as well. As a general rule of thumb is wise to always use the smallest type possible.

Zalomon
  • 553
  • 4
  • 14
  • This is wrong. As the manual states, the content size is irrelevant for differences in the size of the storage, and the difference in storage size would disappear if there where only one type (that would need no extra bytes to store the size). –  Apr 06 '16 at 11:45
  • If the reason for mysql to keep track of the file size was the fact that there are different data types for TEXT it would be a stupid thing to do. Don't you think? Since each table has only one type on each column. By accepting 4 types you can use the one that gives you the lesser overhead – Zalomon Apr 06 '16 at 13:09
  • My guess of because they keep track of the size of the file (and what I think is what you actually want to know) is that they needed to know where to read to retrieve. Simplifying a great deal: imagine a huge array where you got the pointer to the the start of each diferent row and use the length to know where to stop. – Zalomon Apr 06 '16 at 13:13