8

I have a field "descr" varchar(15000) which will regularly hold between 1000 and 5000 characters of text per row, is returned in every query, and will be frequently searched with LIKE %search% (the database is mysql 5.5 and fulltext indexes aren't available). The text inputted is research data and so being unique is not a need - but being searchable is.

The table is innodb with utf-8 encoding. The number of rows is not large (30,000). Maximum index size on the varchar is (255), yet when I do a search on the column, it correctly returns a row with 3000 character input.

I have read a lot on indexing, the most relevant being MySQL: Large VARCHAR vs. TEXT?:

  • TEXT is stored off the table with the table .
  • VARCHAR is stored inline and is much faster when the size is reasonable and the data is frequently retrieved.

I need help understanding:

 1. What is the performance impact on retrieval (with 30,000 rows)
    going to a text field and     
 2. Is a varchar index workable for
    searching through 3000-5000 character fields? How is the search able
    to find strings with data longer than the 255 varchar index anyway?
    Or would you advise going with mediumtext?

Thanks for your input.

Community
  • 1
  • 1
mseifert
  • 5,390
  • 9
  • 38
  • 100

1 Answers1

10

First, it doesn't matter how you index the column if your search uses LIKE %search% because that predicate can't make use of a BTREE index. It's going to do a table-scan regardless of whether you choose VARCHAR or TEXT.

Second, there's no difference between how InnoDB stores VARCHAR vs. TEXT. They are both handled as variable-length strings. If they can fit on the same data page with other columns, then they do. If they're too long for a page (or really a little less than half of one page, because each page must fit at least two rows, plus some header information), then only 768 bytes are stored on the page, and the remainder of the string goes on overflow pages. Unless you declare the table to to use ROW_FORMAT=DYNAMIC, in which case all of the string goes on overflow pages if it doesn't fit on the page. See also http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/

You may also be interested in my presentation Full Text Search Throwdown. I compare other companion technology, such as Sphinx Search.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank you for the info - it helps a lot. The Full Text Search Throwdown options look like they are only viable alternatives if I can install software on the server, which I cannot (please correct me if I am wrong). Looks like I will have to live with what I have until my hosting company upgrades to mysql 5.6. I don't think I am ready to switch to isam at this point - unless search times become unbearable. Hopefully on up to 30k rows, it can plow through them sufficiently fast. – mseifert Oct 26 '13 at 15:40
  • 1
    One strategy is to leave your data in InnoDB, but create a MyISAM table with a *copy* of the columns you want to create a fulltext index for. That way when your MyISAM table gets corrupted, you have the original data safely in InnoDB and you can recreate the MyISAM table. – Bill Karwin Oct 26 '13 at 17:36
  • To be clear, I don't recommend using MyISAM if you can avoid it. But if you treat it like any ephemeral cache and only store stuff in it that is easily recreated, you can take advantage of one of its last features that still makes it worthwhile -- fulltext indexes. BTW, you know InnoDB supports FTS in 5.6, but based on our experience it needs some further work before it can be considered reliable. – Bill Karwin Oct 26 '13 at 17:38
  • That's an interesting idea - I will give it some thought. The only major downside seems like a double write with every edit and occasional regeneration of the MyISAM table - not a big deal. I was beginning to think this was another dumb question with the obvious no answer. I love this site - the support & creativeness of the community, as a whole, is a powerful resource. Based on this idea, I can say this thread is answered. Thanks again. – mseifert Oct 27 '13 at 02:57