899

I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.

Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Tom
  • 30,090
  • 27
  • 90
  • 124
  • 34
    A bit old, but I came here because I ran into a problem that made me think about this. In my case my front-end form was limited to 2,000 characters but the encoding implicit in my storage method encoded international characters as multiple characters (which can apparently anywhere from 3 - 12 per character). So my 2,000 suddenly becomes up to 24,000. Something to think about... – James S Mar 24 '14 at 22:40
  • 3
    I have found text to be significantly faster for many concurrent inserts. – Ray S. Mar 28 '14 at 08:47
  • 1
    @JamesS: utf8mb4... >. – indivisible Apr 20 '15 at 22:44
  • 1
    Here is a new thread: https://dba.stackexchange.com/questions/210408/in-what-cases-are-blob-and-text-stored-in-line-on-innodb – Rick James Jun 22 '18 at 22:08
  • 1
    @Rick James - The question you have linked to is not the same question at all. Please note that this thread has been viewed 400k times and you propose to replace it with something with 35 views and your own answer as the top answer? This question is still perfectly valid and the answers here a useful record. – Tom Jun 23 '18 at 20:20
  • 12
    @RickJames consider posting an updated answer, rather than close the question –  Jun 24 '18 at 02:19
  • 3
    @YvetteColomb - I added an Answer. I would mainly like to get rid of the Accepted Answer because it is _out of date_. I came to the Q&A because someone was quoting incorrect info, saying "754 upvotes, so it must be right". OK, I edited the Approved answer, too. (Though that feels improper.) – Rick James Jun 25 '18 at 16:09
  • 1
    [MySQL :: MySQL 8.0 Reference Manual :: 15.10 InnoDB Row Formats](https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html) – Jason Law Apr 17 '20 at 03:10

9 Answers9

844
  • TEXT and BLOB may by stored off the table with the table just having a pointer to the location of the actual storage. Where it is stored depends on lots of things like data size, columns size, row_format, and MySQL version.

  • VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a real-world scenario with your data.

Justin Johnson
  • 30,978
  • 7
  • 65
  • 89
MindStalker
  • 14,629
  • 3
  • 26
  • 19
  • 155
    +1: VARCHAR (stored inline) is usually faster IF the data is frequently retrieved (included by most queries). However, for a large volume of data that is not normally retrieved (that is, not referenced by any query), then it may be better to not have the data stored inline. There is an upper limit on the row size, for data stored inline. – spencer7593 Jan 14 '11 at 17:54
  • 23
    @Pacerier: the exact benefit of avoiding "inline" storage is an increase in the number of rows that can be stored in a block, which means the table rows occupy fewer blocks in the InnoDB buffer cache (smaller memory footprint), and means fewer blocks to be transferred to and from disk (reduced I/O). But, this is only a performance benefit if the columns stored "off row" are largely unreferenced by queries. If those "off row" columns are referenced by most queries, that benefit largely evaporates. Inline is preferred if the columns fit in the max rowsize and are frequently referenced. – spencer7593 Jun 04 '13 at 22:35
  • 253
    "VARCHAR is faster when the size is reasonable". What is a "reasonable" number of characters, 100? 1000? 100,000? – tim peterson Sep 22 '13 at 13:56
  • 135
    This answer is not correct for InnoDB. Both VARCHAR and BLOB/TEXT are stored inline with other columns if the value on a given row fits in the page size (16KB and each page must hold at least two rows). If the string is too large for that, it overflows to additional pages. See http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ for a detailed explanation. – Bill Karwin Jan 01 '14 at 21:43
522

Can you predict how long the user input would be?

VARCHAR(X)

Max Length: variable, up to 65,535 bytes (64KB)
Case: user name, email, country, subject, password


TEXT

Max Length: 65,535 bytes (64KB)
Case: messages, emails, comments, formatted text, html, code, images, links


MEDIUMTEXT

Max Length: 16,777,215 bytes (16MB)
Case: large json bodies, short to medium length books, csv strings


LONGTEXT

Max Length: 4,294,967,29 bytes (4GB)
Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging

There's more information on this question.

ashleedawg
  • 20,365
  • 9
  • 72
  • 105
Michael J. Calkins
  • 32,082
  • 15
  • 62
  • 91
  • 9
    Predictability is really a side item here. It's actually maximum expected length that should be the deciding factor. The items you mention as more predictable are only that way because they are *shorter* than the others. – Andrew Barber Nov 01 '12 at 19:46
  • 34
    @andrew-barber That's my point though. All the other posts explain well about the differences but not about the situations when you actually have to make a choice between the two. I was trying to point out using varchar for predictably short is a good choice and using text for arbitrarily long is a good choice. – Michael J. Calkins Nov 01 '12 at 20:28
224

Just to clarify the best practice:

  1. Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)

  2. String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).

I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.

What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.

pb2q
  • 58,613
  • 19
  • 146
  • 147
James
  • 3,852
  • 2
  • 19
  • 14
  • 1
    So what would you recommend dealing with over-sized URLs like you have from`magnet:` URLs? They can be VERY much longer than just 255 characters. – Roland Jun 19 '22 at 20:52
39

Short answer: No practical, performance, or storage, difference.

Long answer:

There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)

For smaller limits in VARCHAR, there are some advantages over TEXT.

  • "smaller" means 191, 255, 512, 767, or 3072, etc, depending on version, context, and CHARACTER SET.
  • INDEXes are limited in how big a column can be indexed. (767 or 3072 bytes; this is version and settings dependent)
  • Intermediate tables created by complex SELECTs are handled in two different ways -- MEMORY (faster) or MyISAM (slower). When 'large' columns are involved, the slower technique is automatically picked. (Significant changes coming in version 8.0; so this bullet item is subject to change.)
  • Related to the previous item, all TEXT datatypes (as opposed to VARCHAR) jump straight to MyISAM. That is, TINYTEXT is automatically worse for generated temp tables than the equivalent VARCHAR. (But this takes the discussion in a third direction!)
  • VARBINARY is like VARCHAR; BLOB is like TEXT.
  • A table with several 'large' VARCHARs could hit a limit of 64KB for the whole table definition; switching to TEXT is a simple and practical fix. (Example: (42000) Row size too large, from an Oracle dump to a MySQL dump )

Rebuttal to other answers

The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.

When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSED) were introduced.

The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • 2
    @KostaKontos - Thanks for the praise and the typo fix. When I see a need for a better answer, I will add an answer, even if 8 years and 800 upvotes too late. – Rick James Apr 02 '20 at 15:06
34

Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.

I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.

Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.

Michael Anderson
  • 70,661
  • 7
  • 134
  • 187
  • 39
    The row length limit is 65,535 bytes [ http://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html ]. If your column is utf8-encoded, that means a 3000-character `varchar` column can take up to 9000 bytes. – Jan Fabry Jan 07 '10 at 21:05
  • 7
    UTF-8 characters can be up to 4 bytes, so I think you meant 12,000 bytes (unless there is some MySQL thing I'm not understanding here). – raylu Jul 10 '11 at 03:15
  • 13
    @raylu MySQL's UTF-8 is "fake UTF-8" in that it only supports 3 bytes per character max, so there is no way to directly store unicode characters beyond BMP plane in MySQL's UTF-8. This is fixed in MySQL 5.5. – Pacerier Jul 06 '12 at 05:28
  • 2
    I believe that this assertion is valid for MyISAM only. I can't find a definitive source but I believe that InnoDB stores `TEXT` inline in the table as well. – dotancohen Dec 02 '13 at 14:39
  • 2
    @dotancohen I found a source here explaining that storing of variable length data using InnoDB may vary (can be stored externally or inline within the row) http://mysqlserverteam.com/externally-stored-fields-in-innodb/ – KiX Ortillan Aug 28 '15 at 00:43
  • Depending or row_format, row size, and other things, none, some, or all of a `TEXT` or `VARCHAR` is stored off-record. You can't make a simple statement about what is done in InnoDB. – Rick James Jun 25 '18 at 21:05
  • @AnthonyRutledge - 5.1 does _not have_ utf8mb4; 5.5-5.7 have utf8mb4 as an _option_; 8.0 _defaults_ to the full 4-byte UTF-8. – Rick James Jun 25 '18 at 21:07
  • @RickJames Yes, but it does have a three byte UTF-8. – Anthony Rutledge Jun 25 '18 at 22:36
  • @AnthonyRutledge - Yes, `utf8` (3-byte max) has been around since 4.1. – Rick James Jun 25 '18 at 23:37
  • @RickJames Shame about the cursors, views, triggers, and stored procedures, though. – Anthony Rutledge Jun 26 '18 at 00:54
7

The preceding answers don't insist enough on the main problem: even in very simple queries like

(SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id) 

a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.

Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/ (This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)

Gerry
  • 10,337
  • 3
  • 31
  • 40
Max
  • 415
  • 5
  • 12
4

Varchar is for small data like email addresses, while Text is for much bigger data like news articles, Blob for binary data such as images.

The performance of Varchar is more powerful because it runs completely from memory, but this will not be the case if data is too big like varchar(4000) for example.

Text, on the other hand, does not stick to memory and is affected by disk performance, but you can avoid that by separating text data in a separate table and apply a left join query to retrieve text data.

Blob is much slower so use it only if you don't have much data like 10000 images which will cost 10000 records.

Follow these tips for maximum speed and performance:

  1. Use varchar for name, titles, emails

  2. Use Text for large data

  3. Separate text in different tables

  4. Use Left Join queries on an ID such as a phone number

  5. If you are going to use Blob apply the same tips as in Text

This will make queries cost milliseconds on tables with data >10 M and size up to 10GB guaranteed.

Achraf Almouloudi
  • 756
  • 10
  • 27
Creative87
  • 125
  • 9
4

There is a HUGE difference between VARCHAR and TEXT. While VARCHAR fields can be indexed, TEXT fields cannot. VARCHAR type fields are stored inline while TEXT are stored offline, only pointers to TEXT data is actually stored in the records.

If you have to index your field for faster search, update or delete than go for VARCHAR, no matter how big. A VARCHAR(10000000) will never be the same as a TEXT field bacause these two data types are different in nature.

  • If you use you field only for archiving
  • you don't care about data speed retrival
  • you care about speed but you will use the operator '%LIKE%' in your search query so indexing will not help much
  • you can't predict a limit of the data length

than go for TEXT.

Viktor Joras
  • 721
  • 9
  • 16
  • Partially misleading info: TEXT columns cannot be index in their entirety. When you include a TEXT column in the index you must specify the length. Also VARCHARs cannot be indexed in their in their entirety in the case of VARCHARs > 255 as there is a max length on the index size. – eRadical May 05 '20 at 14:08
1

Just a correction to so many answers here, even if it is a bit late to the party.

Text fields can be fully indexed by MySQL as per their documentation.

Link provided https://dev.mysql.com/doc/refman/5.6/en/column-indexes.html

Overall Varchar fields are longer to write to then Text fields, but it only matters if you have multitude of write requests