Why does removing the BINARY function call from my SQL query change the query plan so dramatically?

Question

I have a SQL query that looks for a specific value in a table and then does inner joins across three tables to fetch the result set. The three tables are fabric_barcode_oc, fabric_barcode_items & fabric_barcode_rolls

Initial Query

The initial version of the query is below

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE BINARY `roll`.`roll_number` = 'dZkzHJ_je8'

When running EXPLAIN ANALYZE on this, I get the following

"-> Nested loop inner join  (cost=468160.85 rows=582047) (actual time=0.063..254.186 rows=1 loops=1)
    -> Nested loop inner join  (cost=264444.40 rows=582047) (actual time=0.057..254.179 rows=1 loops=1)
        -> Filter: (cast(roll.roll_number as char charset binary) = 'dZkzHJ_je8')  (cost=60727.95 rows=582047) (actual time=0.047..254.169 rows=1 loops=1)
            -> Table scan on roll  (cost=60727.95 rows=582047) (actual time=0.042..198.634 rows=599578 loops=1)
        -> Single-row index lookup on oc using PRIMARY (oc_unique_id=roll.oc_unique_id)  (cost=0.25 rows=1) (actual time=0.009..0.009 rows=1 loops=1)
    -> Single-row index lookup on item using PRIMARY (item_unique_id=roll.item_unique_id_fk)  (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=1)
"

Updated Query

I then changed the query to

EXPLAIN ANALYZE
SELECT `oc`.`oc_number` AS `ocNumber` , `roll`.`po_number` AS `poNumber` ,
`item`.`item_code` AS `itemCode` , `roll`.`roll_length` AS `rollLength` ,
`roll`.`roll_utilized` AS `rollUtilized`
FROM `fabric_barcode_rolls` AS `roll`
INNER JOIN `fabric_barcode_oc` AS `oc` ON `oc`.`oc_unique_id` = `roll`.`oc_unique_id`
INNER JOIN `fabric_barcode_items` AS `item` ON `item`.`item_unique_id` = `roll`.`item_unique_id_fk`
WHERE `roll`.`roll_number` = 'dZkzHJ_je8'

and this generates the following execution plan

"-> Rows fetched before execution  (cost=0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)

The only difference between the two queries is that I removed the BINARY function call from the query. I'm confused by why the plan is so different?

Execution Times

Query 1 had an execution time of ~375ms while the second query had an execution time of ~160ms.

What is causing this difference?

UPDATE

Including the table schema definition for fabric_barcode_rolls as requested

fabric_barcode_rolls,"CREATE TABLE `fabric_barcode_rolls` (
  `roll_unique_id` int NOT NULL AUTO_INCREMENT,
  `oc_unique_id` int NOT NULL,
  `item_unique_id_fk` int NOT NULL,
  `roll_number` char(30) NOT NULL,
  `roll_length` decimal(10,2) DEFAULT '0.00',
  `po_number` char(22) DEFAULT NULL,
  `roll_utilized` decimal(10,2) DEFAULT '0.00',
  `user` char(30) NOT NULL,
  `mir_number` char(22) DEFAULT NULL,
  `mir_location` char(10) DEFAULT NULL,
  `mir_stamp` datetime DEFAULT NULL,
  `creation_stamp` datetime DEFAULT CURRENT_TIMESTAMP,
  `update_stamp` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`roll_unique_id`),
  UNIQUE KEY `roll_number` (`roll_number`),
  KEY `fabric_barcode_item_fk` (`item_unique_id_fk`),
  CONSTRAINT `fabric_barcode_item_fk` FOREIGN KEY (`item_unique_id_fk`) REFERENCES `fabric_barcode_items` (`item_unique_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=610684 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci"

So far as I know , the BINARY string expression makes the string exactly what it appears as opposed to MySQL's case-insensitive check. Therefore, the WHERE clause from those two queries can be seen as EXACT match VS FUZZY check . — blabla_bingo, Apr 18 '22 at 06:20
Is https://stackoverflow.com/questions/8667379/using-binary-text-comparison-in-mysql-efficiency-pitfalls relevant? i.e. is the issue that you have an index on roll.roll_number, and using BINARY prevents that index being used. If so, make the column have a binary collation as explained at https://dev.mysql.com/doc/refman/8.0/en/charset-column.html, or do not use BINARY here. — mc110, Apr 18 '22 at 07:05
@mc110 Yeah, that answer is definitely relevant! And yes, we do have an index on on `roll.roll_number` and you are probably correct in implying that MySQL can't use the index on that column when I cast to binary. Probably why it does a full table scan everytime. Anyway, we decided to remove the BINARY function call and assume that barcodes are random enough that we don't need to check case. — random_coder_101, Apr 18 '22 at 08:06
Can you please [edit] your question to show us the output of `SHOW CREATE TABLE fabric_barcode_rolls` ? And, with respect, my spidey sense of "oh no there will be midnight troubleshooting sessions" tingles fiercely hearing your assumption that your case-insensitive index won't fail you and your customers. Don't do that. Change the column's collation. Seriously. — O. Jones, Apr 18 '22 at 14:05

O. Jones · Accepted Answer · 2022-04-18T18:33:00.923

Your performance difference is due to this fact: in MySQL, collations on VARCHAR() and CHAR() columns are baked into the indexes.

Edit updated to match the table definition.

Your fabric_barcode_rolls table has a column defined like this:

roll_number char(30) NOT NULL,
...
UNIQUE KEY roll_number (roll_number).

So, your WHERE ... BINARY roll.roll_number = 'dZkzHJ_je8' filter clause is not sargable: it can't use the index on that column. But WHERE ... roll.roll_number = 'dZkzHJ_je8' is sargable: it does use the index. So it's fast. But the column's default collation is case-insensitive. So, it's fast and wrong.

That can be fixed.

Notice there's no collation declaration on the column. That means it's using the table's default: utf8mb4_0900_ai_ci, a case-insensitive collation.

What you want for an ordinary barcode column is a one-byte-per-character charset and a case-sensitive collation. This would change your table to do that.

 ALTER TABLE fabric_barcode_rolls
CHANGE  roll_number 
        roll_number CHAR(30) COLLATE latin1_bin NOT NULL;

This is a multilevel win. Using the correct character set for your barcodes saves data. It makes the indexes shorter and more efficient to use. It does case-sensitive (binary-match) lookups, which themselves make indexes shorter and much more efficient to use. And it doesn't run the collision risk between barcodes with upper and lower case character sets.

Before you conclude that the collision risk is so low you don't have to worry about it, please read about the birthday paradox.

There's also a win on using binary collations, because MySQL compares whole strings with `memcmp()` instead of being forced to compare character-by-character to test character equivalency according to the column's collation. — Bill Karwin, Apr 18 '22 at 15:03
Declare it `BINARY(30)`, thereby avoiding even latin1. Furthermore, `VARBINARY(30)` looks more appropriate. — Rick James, Apr 21 '22 at 20:48

Why does removing the BINARY function call from my SQL query change the query plan so dramatically?

1 Answers1