80

I have a MySQL table for which I do very frequent SELECT x, y, z FROM table WHERE x LIKE '%text%' OR y LIKE '%text%' OR z LIKE '%text%' queries. Would any kind of index help speed things up?

There are a few million records in the table. If there is anything that would speed up the search, would it seriously impact disk usage by the database files and the speed of INSERT and DELETE statements? (no UPDATE is ever performed)

Update: Quickly after posting, I have seen a lot of information and discussion about the way LIKE is used in the query; I would like to point out that the solution must use LIKE '%text%' (that is, the text I am looking for is prepended and appended with a % wildcard). The database also has to be local, for many reasons, including security.

kmindi
  • 4,524
  • 4
  • 31
  • 47
Tom
  • 6,991
  • 13
  • 60
  • 78
  • 9
    Think of it this way it is not a perfect analogy, but it demonstrates the point. Consider a book with a word index at the back. If you want to look up a simple word or even words that start with the same first few letters, no problem. Consider what happens when you want to use that book index to look up all words that contain the letters "exe" anywhere in the word. The index is pretty much useless and you might as well read the actual book looking for those words. That's roughly the problem you are imposing on mySQL. – JohnFx Sep 08 '12 at 15:35

9 Answers9

84

An index wouldn't speed up the query, because for textual columns indexes work by indexing N characters starting from left. When you do LIKE '%text%' it can't use the index because there can be a variable number of characters before text.

What you should be doing is not use a query like that at all. Instead you should use something like FTS (Full Text Search) that MySQL supports for MyISAM tables. It's also pretty easy to make such indexing system yourself for non-MyISAM tables, you just need a separate index table where you store words and their relevant IDs in the actual table.

Update

Full text search available for InnoDB tables with MySQL 5.6+.

zen
  • 980
  • 6
  • 18
reko_t
  • 55,302
  • 10
  • 87
  • 77
  • 10
    I should add that since MySQL 5.6, full text search is also available for InnoDB. – Sergio Apr 23 '13 at 13:11
  • 1
    There are some cases where a full-text search isn't possible though, like with ideographic languages. https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html – Ian Dunn Apr 06 '17 at 23:42
  • 1
    Note that `FULLTEXT` also lets you do all 3 tests at once: `MATCH(x, y, z) AGAINST('text')`. – Rick James Dec 20 '18 at 23:26
  • 1
    full text search uses word boundary and not wildcard saerch – luky May 21 '21 at 14:48
  • I don't understand why this answer is accepted as the provided soultions is not what the OP asked for, it must work same as `%string%` and full-text doesn't do wildcard like search – Abdul Rehman Oct 27 '21 at 10:05
30

An index won't help text matching with a leading wildcard, an index can be used for:

LIKE 'text%'

But I'm guessing that won't cut it. For this type of query you really should be looking at a full text search provider if you want to scale the amount of records you can search across. My preferred provider is Sphinx, very full featured/fast etc. Lucene might also be worth a look. A fulltext index on a MyISAM table will also work, but ultimately pursuing MyISAM for any database that has a significant amount of writes isn't a good idea.

Michael
  • 8,538
  • 2
  • 21
  • 20
  • 2
    In MySQL 5.6 and up, they can also be used with InnoDB tables. https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html – Lukas Liesis Jun 13 '17 at 07:43
23

An index can not be used to speed up queries where the search criteria starts with a wildcard:

LIKE '%text%'

An index can (and might be, depending on selectivity) used for search terms of the form:

LIKE 'text%'

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
23

Add a Full Text Index and Use MATCH() AGAINST().

Normal indexes will not help you with like queries, especially those that utilize wildcards on both sides of the search term.

What you can do is add a full text index on the columns that you're interested in searching and then use a MATCH() AGAINST() query to search those full text indexes.

  1. Add a full text index on the columns that you need:

    ALTER TABLE table ADD FULLTEXT INDEX index_table_on_x_y_z (x, y, z);
    
  2. Then query those columns:

    SELECT * FROM table WHERE MATCH(x,y,z) AGAINST("text")
    

From our trials, we found these queries to take around 1ms in a table with over 1 million records. Not bad, especially compared to the equivalent wildcard LIKE %text% query which takes 16,400ms.

Benchmarks

MATCH(x,y,z) AGAINST("text") takes 1ms

LIKE %text% takes 16400ms

16400x faster!

Joshua Pinter
  • 45,245
  • 23
  • 243
  • 245
12

I would add that in some cases you can speed up the query using an index together with like/rlike if the field you are looking at is often empty or contains something constant.

In that case it seems that you can limit the rows which are visited using the index by adding an "and" clause with the fixed value.

I tried this for searching 'tags' in a huge table which usually does not contain a lot of tags.

SELECT * FROM objects WHERE tags RLIKE("((^|,)tag(,|$))" AND tags!=''

If you have an index on tags you will see that it is used to limit the rows which are being searched.

OderWat
  • 5,379
  • 3
  • 29
  • 31
  • 1
    This is surprisingly a very good alternative, I was able to index the column this way. – Kenneth Jun 11 '15 at 14:40
  • Better: `RLIKE '[[:<:]]tag[[:>:]]'` -- that is, use "word boundaries". – Rick James Dec 20 '18 at 23:24
  • 1
    @RickJames This was nott about the regular expression but about adding the AND to the query. Besides this, the regular expression used depends on the data. If your tags contain spaces you won't be happy with word boundaries. I am also not sure if searching word boundaries instead a small set of delimiters will be faster on a microbenchmark level. – OderWat Dec 21 '18 at 12:52
  • @OderWat - A small test: `SELECT "What about this?" REGEXP "[[:<:]]about this[[:>:]]";` says `1` (true). – Rick James Dec 21 '18 at 17:24
  • 1
    @RickJames of course this is true. But using word delimiter is still not practical for searching tags from a field. Think of: `SELECT "white lion steak,lion steak,white" REGEXP "[[:<:]]white lion[[:>:]]"`. This is true too, but should probably not. It all depends on how your data is stored. – OderWat Dec 22 '18 at 02:47
6

Maybe you can try to upgrade mysql5.1 to mysql5.7.

I have about 70,000 records. And run following SQL:

select * from comics where name like '%test%'; 

It takes 2000ms in mysql5.1. And it takes 200ms in mysql5.7 or mysql5.6.

lingceng
  • 2,415
  • 1
  • 18
  • 19
  • 5
    That really smells as if not all the data was cached in the buffer_pool on 5.1. 10:1 is a typical factor. And nothing in this area changed from 5.1 to 5.6. – Rick James Dec 20 '18 at 23:20
4

Another way:

You can mantain calculated columns with those strings REVERSEd and use

SELECT x, y, z FROM table WHERE x LIKE 'text%' OR y LIKE 'text%' OR z LIKE 'text%' OR xRev LIKE 'txet%' OR yRev LIKE 'txet%' OR zRev LIKE 'txet%' 

Example of how to ADD a stored persisted column

ALTER TABLE table ADD COLUMN xRev VARCHAR(N) GENERATED ALWAYS AS REVERSE(x) stored;

and then create an indexes on xRev, yRev etc.

Theo
  • 57,719
  • 8
  • 24
  • 41
0

Another alternative to avoid full table scans is selecting substrings and checking them in the having statement:

SELECT 
    al3.article_number,
    SUBSTR(al3.article_number, 2, 3) AS art_nr_substr,
    SUBSTR(al3.article_number, 1, 3) AS art_nr_substr2,
    al1.*
FROM
    t1 al1 
    INNER JOIN t2 al2 ON al2.t1_id = al1.id
    INNER JOIN t3 al3 ON al3.id = al2.t3_id
WHERE
    al1.created_at > '2018-05-29'
HAVING 
    (art_nr_substr = "FLA" OR art_nr_substr = 'VKV' OR art_nr_subst2 = 'PBR');
Juri Sinitson
  • 1,445
  • 1
  • 14
  • 18
0

When you optimize a SELECT foo FROM bar WHERE baz LIKE 'ZOT%' query, you want the index length to at least match the number of characters in the request.

Here is a real life example from just now:

Here is the query:

EXPLAIN SELECT COUNT(*) FROM client_detail cd
JOIN client_account ca ON cd.client_acct_id = ca.client_acct_id
WHERE cd.first_name LIKE 'XX%' AND cd.last_name_index LIKE 'YY%';

With no index:

+-------+
| rows  |
+-------+
| 13994 |
|     1 |
+-------+

So first try a 4x index,

CREATE INDEX idx_last_first_4x4 on client_detail(last_name_index(4), first_name(4));
+------+
| rows |
+------+
| 7035 |
|    1 |
+------+

A bit better, but COUNT(*) shows there are only 102 results. So lets now add a 2x index:

CREATE INDEX idx_last_first_2x2 on client_detail(last_name_index(2), first_name(2));

yields:

+------+
| rows |
+------+
|  102 |
|    1 |
+------+

Both indexes are still in place at this point, and MySQL chose the latter index for this query---however it will still choose the 4x4 query if it is more efficient.

Index ordering may be useful, try the 2x2 before the 4x4 or vice-versa to see how it performs for your environment. To re-order an index you have to drop and re-create the earlier one.

KJ7LNW
  • 1,437
  • 5
  • 11