4

I need to store text in a MySQL database. The size of the text is different on each row, generally about 10kb, but can be 100kb in some very rare cases.

The text will be accessed pretty frequently for both reads and writes. For this reason I believe it will be best to make it VARCHAR as it's a lot faster than TEXT.

Question: Is it possible to make the column VARCHAR and in the very rare case the text is more than 64kb, somehow store this as TEXT?

Gumbo
  • 643,351
  • 109
  • 780
  • 844
We're All Mad Here
  • 1,544
  • 3
  • 18
  • 46
  • 1
    You can't change the datatype of a column in this way, a column has a datatype, not an arbitrary type based on the actual content that it contains.... what makes you believe that it's better to use VARCHAR rather than TEXT? – Mark Baker Feb 01 '14 at 18:22
  • 1
    You probably don't need to worry about it. – vascowhite Feb 01 '14 at 18:23
  • 5
    Maybe you can have two fields (a varchar and a text) and your application will choose whatever is better when inserting the data. –  Feb 01 '14 at 18:24
  • I need to access the data frequently and this: http://forums.mysql.com/read.php?24,105964,105964 made me believe using TEXT is a bad choice for frequently accessed data. – We're All Mad Here Feb 01 '14 at 18:24
  • If you need to index large volumes of text, use full text indexing on a TEXT column; if you don't need indexing, but still need data up to 100k, use a TEXT column – Mark Baker Feb 01 '14 at 18:26
  • I don't need to index the data. – We're All Mad Here Feb 01 '14 at 18:27
  • 1
    try blob since it's binary vs text which isn't – Eric Feb 01 '14 at 18:27
  • If you don't need t index the data, then you don't have any problems with using text.... the link you posted is about indexes on varchar and text datatypes.... of course, putting the primary key on a TEXT or an extremely large VARCHAR isn't generally a sensible idea – Mark Baker Feb 01 '14 at 18:28
  • @MarkBaker, you can't put a primary key or unique key constraint on a TEXT column, or a VARCHAR that is too long. Keys have a maximum length of 1000 bytes. – Bill Karwin Feb 01 '14 at 18:30
  • @BillKarwin - the article linked in the comment compares a PK index on a TEXT column (MyISAM engine) and a PK Index on a VARCHAR... but only on the first 255 characters of the TEXT column, and the VARCHAR is 255 long – Mark Baker Feb 01 '14 at 18:32
  • Okay, I think I will use 2 columns, one `VARCHAR` and one `MEDIUMTEXT` and have my program use the `VARCHAR` when possible. This wastes 3 bytes per row and I need to read the `VARCHAR` all the time to see if empty. Is this a good idea? – We're All Mad Here Feb 01 '14 at 18:35
  • Is the overhead in your code for determining which field to use when storing, or which field has been used when retrieving the data really worth 3 bytes per row of disk space? It feels like you're creating potential problems for yourself trying to eliminate a problem that doesn't actually exist – Mark Baker Feb 01 '14 at 18:42
  • I believe it's worth 3 bytes per row for the performance improvement of `VARCHAR` vs `MEDIUMTEXT`, given that 99% of my columns will be `10kb`. Now, to determine which field to use, I will just always read the `VARCHAR`. If empty, read the TEXT. 99% of time the `VARCHAR` will contain data. – We're All Mad Here Feb 01 '14 at 18:46

3 Answers3

5

The largest VARCHAR you can declare is 64KB. The largest length for a TEXT column is 64KB.

See http://dev.mysql.com/doc/refman/5.6/en/string-type-overview.html

If you need longer strings, you should use MEDIUMTEXT. This data type can store up to 16MB.

Where did you get the idea that VARCHAR is "a lot faster than TEXT?" In the InnoDB storage engine, VARCHAR, TEXT, and BLOB are stored identically.

I read the forum article linked to, and my reactions are:

  • The example shown in that thread uses the MyISAM storage engine, which is virtually obsolete and bad practice to use.

  • The writer misunderstands what "Using index" means. It really means "Using only index, removing the need to read the table row." When you try to use a prefix index, this spoils any chance of doing an index-only query.

Trying to use a prefix index for a PRIMARY or UNIQUE field has some unintended consequences, like uniqueness cannot be enforced properly:

mysql> create table foo (t text, primary key(t(50)));
mysql> insert into foo values (concat(repeat('a',50), 'x'));
mysql> insert into foo values (concat(repeat('a',50), 'y'));
ERROR 1062 (23000): Duplicate entry
  'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaay' for key 'PRIMARY'

Okay, I think I will use 2 columns, one VARCHAR and one MEDIUMTEXT and have my program use the VARCHAR when possible. This wastes 3 bytes per row and I need to read the VARCHAR all the time to see if empty. Is this a good idea?

I wouldn't do this. It complicates your application when you have to check which column your data is in. Any trivial advantage you might get from storing the strings in one column type or the other are outweighed by the extra coding work you have to do, and the potential bugs you will introduce.

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Okay, your answer made me understand a lot of things but I have a last question: What's the performance diff in InnoDB of `MEDIUMBLOB` vs `VARBINARY` `64kb`? I am storing images there. – We're All Mad Here Feb 01 '14 at 19:03
  • In InnoDB, VARCHAR, TEXT, BLOB, and VARBINARY are all of a kind. They are all [stored in exactly the same way](http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/), so there should be no significant performance difference. – Bill Karwin Feb 01 '14 at 19:12
2

SQL is strongly typed so there is no dynamic type switching. You can use a LONGTEXT column and create an index on the first x-characters of the column. This should give you the query performance you need while allowing you to store large values.

CREATE INDEX ltcol_idx ON my_table (ltcol(64));
David H. Bennett
  • 1,822
  • 13
  • 16
2

I used to store the text as blobs, it can hold ~64k of UTF-8 text symbols in InnoDB engine.

vodolaz095
  • 6,680
  • 4
  • 27
  • 42
  • The BLOB datatype doesn't preserve character set information. At my company, we have helped quite a few of our clients who stored strings in BLOB with different character sets, and they got their data all confused. BLOB and TEXT are otherwise the same data type, so just store text in TEXT and save yourself some potential grief. – Bill Karwin Feb 01 '14 at 19:14