1

All,

I'm writing a web app that will receive user generated text content. Some of those inputs will be a few words, some will be several sentence long. In more than 90% of cases, the inputs will be less than 800 characters. Inputs need to be searchable. Inputs will be in various character sets, including Asian. The site and the db are based on utf8.

I understand roughly the tradeoffs between VARCHAR and TEXT. What I am envisioning is to have both a VARCHAR and a TEXT table, and to store inputs on one or the other depending on their size (this should be doable by the PHP script).

What do you think of having several tables for data based on its size? Also, would it make any sense to create several VARCHAR tables for various size ranges? My guess is that I will get a large number of user inputs clustered around a few key sizes.

Thanks,

JDelage

JDelage
  • 13,036
  • 23
  • 78
  • 112

3 Answers3

5

Storing values in one column vs another depending on size of input is going to add a heck of a lot more complexity to the application than it'll be worth.

As for VARCHAR vs TEXT in MySQL, here's a good discussion about that, MySQL: Large VARCHAR vs TEXT.

The "tricky" part is doing a full-text search on this field which requires the use of MyISAM storage engine as it's the only one that supports full-text indexes. Also of note is that sometimes at the cost of complicating the system architecture, it might be worthwhile to use something like Apache Solr as it perform full-text search much more efficiently. A lot of people have most of the data in their MySQL database and use something like Solr just for full-text indexing that text column and later doing fancy searches with that index.

Re: Unicode. I've used Solr for full-text indexing of text with Unicode characters just fine.

Community
  • 1
  • 1
Jinyoung Kim
  • 2,171
  • 20
  • 32
4

Comments are correct. You are only adding 1 byte by using the TEXT datatype over VARCHAR.

Storage Requirements:

  • VARCHAR Length of string + 1 byte
  • TEXT Length of string + 2 bytes
James L.
  • 4,032
  • 1
  • 15
  • 15
1

The way I see it is you have two options:

  • Hold it in TEXT, it will waste single additional byte on storage and additional X processing power on search.
  • Hold it in VARCHAR, create additional table named A_LOT_OF_TEXT with the structure of (int row_id_of_varchar_table, TEXT). If the data is small enough, put it in varchar, otherwise put a predefined value instead of data, for example 'THE_DATA_YOU_ARE_LOOKING_FOR_IS_IN_TABLE_NAMED_A_LOT_OF_TEXT' or just simply NULL and put the real data to table A_LOT_OF_TEXT.