2

I have a very slow query where the first part is created by a gem (https://github.com/CanCanCommunity/cancancan, it creates the select and the inner query) and where I add an ORDER BY and LIMIT for a cursor based pagination.

SELECT `spree_products`.*
FROM `spree_products`
WHERE `spree_products`.`id` IN
    (SELECT `spree_products`.`id`
     FROM `spree_products`
     LEFT OUTER JOIN `spree_vendors` ON `spree_vendors`.`id` = `spree_products`.`vendor_id`
     WHERE `spree_vendors`.`active` = TRUE)
ORDER BY `spree_products`.`id` ASC
LIMIT 50;

=> 50 rows in set (1 min 3.48 sec)

This are the tables:

CREATE TABLE `spree_products` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `available_on` datetime DEFAULT NULL,
  `permalink` varchar(255) DEFAULT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  `count_on_hand` int(11) DEFAULT NULL,
  `vendor_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_spree_products_on_vendor_id` (`vendor_id`)
) ENGINE=InnoDB AUTO_INCREMENT=37209248 DEFAULT CHARSET=utf8mb4
CREATE TABLE `spree_vendors` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `active` tinyint(1) DEFAULT '0',
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4413 DEFAULT CHARSET=utf8mb4

(I removed unneccessary fields to keep it tidy)

The EXPLAIN on the query above returns this:

+----+-------------+----------------+------------+--------+-------------------------------------------+-----------------------------------+---------+--------------------------------+------+----------+----------------------------------------------+
| id | select_type | table          | partitions | type   | possible_keys                             | key                               | key_len | ref                            | rows | filtered | Extra                                        |
+----+-------------+----------------+------------+--------+-------------------------------------------+-----------------------------------+---------+--------------------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | spree_vendors  | NULL       | ALL    | PRIMARY                                   | NULL                              | NULL    | NULL                           | 3465 |    10.00 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | spree_products | NULL       | ref    | PRIMARY,index_spree_products_on_vendor_id | index_spree_products_on_vendor_id | 5       | _hubert_test.spree_vendors.id  | 8613 |   100.00 | Using index                                  |
|  1 | SIMPLE      | spree_products | NULL       | eq_ref | PRIMARY                                   | PRIMARY                           | 4       | _hubert_test.spree_products.id |    1 |   100.00 | NULL                                         |
+----+-------------+----------------+------------+--------+-------------------------------------------+-----------------------------------+---------+--------------------------------+------+----------+----------------------------------------------+

When I remove the ORDER BY the query is fast:

SELECT `spree_products`.*
FROM `spree_products`
WHERE `spree_products`.`id` IN
    (SELECT `spree_products`.`id`
     FROM `spree_products`
     LEFT OUTER JOIN `spree_vendors` ON `spree_vendors`.`id` = `spree_products`.`vendor_id`
     WHERE `spree_vendors`.`active` = TRUE)
LIMIT 50;

=> 50 rows in set (0.00 sec)

When I keep the ORDER BY part from the outer query, but remove the WHERE part from the sub query, the query also is fast:

SELECT `spree_products`.*
FROM `spree_products`
WHERE `spree_products`.`id` IN
    (SELECT `spree_products`.`id`
     FROM `spree_products`
     LEFT OUTER JOIN `spree_vendors` ON `spree_vendors`.`id` = `spree_products`.`vendor_id`)
ORDER BY `spree_products`.`id` ASC
LIMIT 50;

I tried adding a composite index to spree_vendors.id / spree_vendors.active, but that didn't help.

Any idea, on how to optimise this query?

UPDATE 1:

A JOIN Variant of this is also slow. The DISTINCT is added by the gem to prevent duplicate records in case you don't select all columns:

SELECT DISTINCT `spree_products`.*
FROM `spree_products`
LEFT OUTER JOIN `spree_vendors` ON `spree_vendors`.`id` = `spree_products`.`vendor_id`
WHERE `spree_vendors`.`active` = TRUE
ORDER BY `spree_products`.`id` ASC
LIMIT 50;

=> 50 rows in set (1 min 43.13 sec)

Without the DISTINCT the query is fast.

UPDATE 2

It was pointed out, that using a LEFT OUTER JOIN inside the sub query returns the whole table. But when using an INNER JOIN it still is slow:

SELECT `spree_products`.*
FROM `spree_products`
WHERE `spree_products`.`id` IN
    (SELECT `spree_products`.`id`
     FROM `spree_products`
     INNER JOIN `spree_vendors` ON `spree_vendors`.`id` = `spree_products`.`vendor_id`
     WHERE `spree_vendors`.`active` = TRUE)
ORDER BY `spree_products`.`id` ASC
LIMIT 50;

=> 50 rows in set (1 min 3.98 sec)
23tux
  • 14,104
  • 15
  • 88
  • 187
  • The problem is WHERE IN, not ORDER BY. Rewrite to INNER JOIN or at least WHERE EXISTS. If your framework cannot then use raw SQL. – Akina Nov 17 '20 at 08:01
  • PS. By the text logic your WHERE IN and the whole subquery must be removed at all - it simply checks that `spree_products.id` is not NULL. – Akina Nov 17 '20 at 08:02
  • ...which is the same as 'inner join' – Strawberry Nov 17 '20 at 08:12
  • And note that LIMIT without ORDER BY is fairly meaningless – Strawberry Nov 17 '20 at 08:13
  • I've updated the question and added a JOIN version of it. @Akina what do you mean by "it simply checks..."? The sub-query selects all `spree_products` that have a `spree_vendor` that has `active = TRUE`, or am I missing something? – 23tux Nov 17 '20 at 08:14
  • Your subquery returns ALL `spree_products.id` present in a table. So you have `WHERE spree_products.id IN (complete spree_products.id list)` which is true until `spree_products.id` is NULL. – Akina Nov 17 '20 at 08:18
  • [Here is an interesting answer](https://stackoverflow.com/a/47097813/14648997) on a similar subject – mbesson Nov 17 '20 at 08:19
  • In updated query - WHERE condition converts LEFT joining to INNER JOIN. – Akina Nov 17 '20 at 08:19
  • @Akina ahh, now I see your point! But when I change the sub-query to use an inner join, it still is slow, see the updated question – 23tux Nov 17 '20 at 08:25
  • Incidentally, the LEFT JOIN is interpreted as an INNER JOIN in this instance (because of the WHERE clause), so it doesn't return the whole table. (Oh essentially a repeat of what Akina said above) – Strawberry Nov 17 '20 at 08:55

1 Answers1

0

Given that id must be PRIMARY, your query must be functionally identical to this:

SELECT [DISTINCT] p.*
  FROM spree_products p
  JOIN spree_vendors v
    ON v.id = p.vendor_id
 WHERE v.active = 1
 ORDER 
    BY p.id ASC
 LIMIT 50;

This would benefit from an index on p.vendor_id, and perhaps v.active.

Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • close, the framework adds a `SELECT DISTINCT` to it. Without the distinct the query is fast, otherwise slow – 23tux Nov 17 '20 at 08:21
  • Do you have the index? – Strawberry Nov 17 '20 at 08:24
  • Running an explain on your query and adding a DISTINCT, it returns this `| 1 | SIMPLE | p | NULL | index | index_spree_products_on_vendor_id | PRIMARY | 4 | NULL | 499 | 100.00 | Using where; Using temporary |` and this index `| 1 | SIMPLE | v | NULL | eq_ref | PRIMARY,index_spree_vendors_on_id_and_active | PRIMARY | 4 | _my_table.p.vendor_id | 1 | 10.00 | Using where; Distinct |` – 23tux Nov 17 '20 at 08:30
  • there is an index on `p.vendor_id`, and I tried adding one to `v.active`, it makes no difference, the query is still slow. – 23tux Nov 17 '20 at 08:37
  • I'm stumped then. This is a tiny dataset; results should be instantaneous, with or without ORDER BY. – Strawberry Nov 17 '20 at 08:40
  • oh sorry, that was a copy-paste error, I've updated it – 23tux Nov 17 '20 at 08:50