0

I have a fairly large table with 60million entries.

Lets call the table I have as bigtable and the column as tracking_code (of type varchar(8)).

When I use the query

SELECT * FROM bigtable WHERE tracking_code LIKE 'SDMASGG_';

It takes 0.75 seconds to execute whereas the query

SELECT * FROM bigtable WHERE tracking_code LIKE 'SDMA_GGH';

takes more than 9 minuites.

I am intrigued by the selectivity shown by mysql towards the location of the '_'

Beri
  • 11,470
  • 4
  • 35
  • 57
user3392665
  • 29
  • 1
  • 9
  • maybe first query is cached? – Peter Apr 22 '15 at 13:09
  • 1
    In some case? Like is slow always. Best practice is avoid using LIKE. Bad new is that you can't optimize LIKE. – Alexander R. Apr 22 '15 at 13:09
  • 1
    Do you have an index on the tracking_code column? You are using the wildcard character "_" in both of your queries. Mysql can make use of an index in a query using "like" if the wildcard characters are at the end of the string, such as in your first example. In your second example, the wildcard is in the middle of your sting, so it can't use the index. – Trenton Trama Apr 22 '15 at 13:12
  • @Peter I don't think so, as I have tried with different tracking_codes and the response time was roughly the same. – user3392665 Apr 22 '15 at 13:13
  • mysql claims to use the index in both above cases, despite `_` being a wildcard; – pala_ Apr 22 '15 at 13:15
  • Have you tried using `EXPLAIN` to see if there is any difference in execution? – Eilidh Apr 22 '15 at 13:16
  • @AlexanderRavikovich I am perfectly happy with the 0.75 seconds response time of LIKE, just that pushing the '_' to somewhere in the middle does not agree well with LIKE ! – user3392665 Apr 22 '15 at 13:16
  • I'm guessing the query engine can write `SDMASGG_` as "begins with..." but having an underscore in the middle is more complex. – DavidG Apr 22 '15 at 13:20
  • @TrentonTrama I do have an index on the tracking_code column. – user3392665 Apr 22 '15 at 13:22
  • Though, as @pala_ says I don't see why the location of the '_' would render the index un-usable. – user3392665 Apr 22 '15 at 13:22
  • @user3392665 can you post the output of `explain` on both of the queries? – pala_ Apr 22 '15 at 13:23
  • Try 'SDMA\\_GGH' (Tnx to @DavidG for tip) – Alexander R. Apr 22 '15 at 13:25
  • If you're going to escape the underscore, just use `=` instead of `like` since its not doing any pattern matching at all any more. – pala_ Apr 22 '15 at 13:25
  • @AlexanderRavikovich the underscore is important here as it's being used in a LIKE clause so escaping it would make it the same as `=`. – DavidG Apr 22 '15 at 13:27
  • Yes. I was thinking first he is store some data that contains underscore. Anyway, if he have fixed range of allowed chars in string, for example A-Z, maybe first generate set of strings (SDMAAGGH, SDMA_BGGH, SDMACGGH....) and try to use IN() instead of LIKE. Not silver bullet, but for this specific case may work much faster with indexes. – Alexander R. Apr 22 '15 at 13:38
  • The index is used with `LIKE` unless the wildcard is the first character – Eilidh Apr 22 '15 at 13:56
  • @Eilidh and pala_ EXPLAIN produces identical output for both the queries – user3392665 Apr 22 '15 at 14:20
  • How many rows returned from each variation? – Rick James Apr 22 '15 at 18:48
  • @RickJames 1000 in the first case and 160000 in the second – user3392665 Apr 23 '15 at 09:01

3 Answers3

1

My comment was partially correct, though I made the mistake of thinking that an index wasn't used if the wildcard was anywhere other than the end of a string in a LIKE comparison.

Mysql BTREE indexes can be used in LIKE comparisons as long as

the argument to LIKE is a constant string that does not start with a wildcard character

If you have a wildcard in the middle of the string, it will only use the characters from the beginning to the first wildcard. In your second example, this means your query only uses the index to match "SDMA". After that, it will scan the remaining rows to find the match.

Only the part before the first wildcard serves as an access predicate. The remaining characters do not narrow the scanned index range—non-matching entries are just left out of the result.

Trenton Trama
  • 4,890
  • 1
  • 22
  • 27
  • 1
    Almost right, it won't scan the rows when it reaches the wildcard but it will iterate through the larger number of matching index entries – symcbean Apr 22 '15 at 22:35
0

There is the option to use a Full Text Search instead of LIKE. The only foreseeable issue I can see with this, is if you're not using myISAM. http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

In the case you're using something else like INNODB, you're going to have to use something like Lucene.

Jay
  • 2,107
  • 4
  • 20
  • 24
0

1000 in the first case and 160000 in the second

It takes a lot longer to shovel 160K rows to the client than to shovel just 1K.

A "fairer" test would be to do SELECT COUNT(*) ... instead of SELECT * ....

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The SELECT COUNT(*)... for the first case it took 0.0 sec and 24 sec for the second. – user3392665 Apr 29 '15 at 11:16
  • I also tried SELECT COUNT(*)... with other tracking_codes: SELECT count(*) FROM bigtable WHERE tracking_code LIKE 'SDMASGG_'; count is 1000 returned in 0.0 sec. Whereas SELECT count(*) FROM bigtable WHERE tracking_code LIKE 'SD_ASGGH'; count is 653 still it takes 20 sec. – user3392665 Apr 29 '15 at 11:22
  • "0.0 sec" sounds like the Query cache kicked in. `SELECT SQL_NO_CACHE ...` will avoid it. – Rick James Apr 29 '15 at 13:51