1

I have another follow-up from this question. Although the LIKE pattern search (because of the collation set) uses the index and is much faster than LIKE BINARY, doing an explain over both of the queries seems to say that both queries use the index.

explain select count(*) from TransactionApp_transactions where merchantId like 'VCARD000%'

| id | select_type | table                       | type  | possible_keys                        | key                                 | key_len | ref  | rows     | Extra                    |
+----+-------------+-----------------------------+-------+--------------------------------------+--------------------------------------+---------+------+----------+--------------------------+
|  1 | SIMPLE      | TransactionApp_transactions | range | TransactionApp_transactions_fc3e7169 | TransactionApp_transactions_fc3e7169 | 767     | NULL | 12906834 | Using where; Using index |

And I get the exact same EXPLAIN output for explain select count(*) from TransactionApp_transactions where merchantId like binary 'VCARD000%'; (with the rows being a slightly smaller number)

The key column contains the index name for both EXPLAIN outputs, but the LIKE BINARY takes 26 seconds, compared to just 2 seconds for the simple LIKE.

Sidharth Samant
  • 714
  • 8
  • 28

2 Answers2

1

A query using WHERE column LIKE 'value%' is much faster than a query using WHERE column LIKE BINARY 'value%' given an index on column. This is entirely expected. Why?

The LIKE 'value%' filter uses the index to find matching rows. It assumes the 'value%' text to match uses the same collation as the column. And, indexes use the collation of the columns they index; the collation is baked into the index. In the scenario where the column contains case-insensitive text, this makes it possible for searches for 'value%', 'Value%', and 'VALUE%' all to use the index even though case must be ignored in the search.

When you use LIKE BINARY rather than LIKE, you declare what collation you need for the search. You tell MySQL's query planner it cannot use the index it has, because that index's baked-in collation is not useful for your query. So it does a full table scan. That is slow.

If you want case-sensitive searches, declare a case-sensitive collation for the column when you create (or alter) the table. Then the LIKE filter will be case-sensitive.

To get a case-sensitive collation you can create the table something like this (you didn't show your table definition so this is a guess).

  CREATE TABLE tbl (
     ...
     mid VARCHAR(255) CHARACTER SET latin1 COLLATE latin1_bin,
     ...
  )

or you can alter the column something like this to have a case-sensitive collation.

 ALTER TABLE tbl MODIFY mid VARCHAR(255) COLLATE latin1_bin;

Remark: WHERE col LIKE BINARY 'value%' is a strange way to write a query. That may be why EXPLAIN doesn't do a good job with it.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thanks! I would also like to know what would be an appropriate way to write the query instead of `WHERE col LIKE BINARY 'value%'`, which is what Django produces from its ORM (the ORM equivalent would be `col__startswith='value'`). – Sidharth Samant Apr 17 '19 at 10:44
0

Use the same collation for all of these:

  • connecting to the database
  • bytes in the client
  • the column being tested.

Probably merchantId is (or _should be) CHARACTER SET ascii. This may be at odds with other strings (utf8mb4 being preferred for names, etc).

But first, how are you connecting and what is the table definition?

Rick James
  • 135,179
  • 13
  • 127
  • 222