optimize select rows based on previous select result

Question

I have a fields table like this:

product_id  |aid    |value|
------------|-------|-----|
789         |6      |1    |
789         |6      |3    | -->aid = 6 , value = 3
789         |      8|    8| -->rows that i want with aid 8
789         |      8|   11| -->rows that i want with aid 8
789         |      8|   82| -->rows that i want with aid 8
------------|-------|-----|
790         |6      |2    |
790         |6      |3    | -->aid = 6 , value = 3
790         |6      |4    |
790         |      8|    8| -->rows that i want with aid 8
790         |      8|   16| -->rows that i want with aid 8
------------|-------|-----|
791         |6      |7    |
791         |8      |13   |
------------|-------|-----|

I want all rows with aid = 8 for specific products that have pair of (aid = 6 AND value = 3) what i did: First i select distinct product_ids having aid = 6and value = 3.
Then i select all rows with aid where product_ids are IN previous select query. here is my query which take about 1 second.

SELECT DISTINCT `value` FROM `fields`
WHERE aid = 8 AND product_id IN 
(
    SELECT DISTINCT `fields`.product_id FROM `fields` 
    WHERE aid = 6 AND `value` = 3
)

the value result is 8,11,82,16
is there a more optimized way to do this?

And how large is the number of matching rows (before the distinct)? — Gordon Linoff, Dec 11 '16 at 22:58
based on this specefic query ~0.7 K results and ~0.4K with distinct — osyan, Dec 11 '16 at 23:08
My guess is that a significant chunk of time is spent on the `select distinct` for the 700 rows. MySQL does have overhead when sorting. Although 700 rows is not very big, it might be a few hundred milliseconds (on your system), eating up a lot of time. — Gordon Linoff, Dec 11 '16 at 23:13
Your query looks fine (unless you have a very special data distribution different from your example). Judging from your timings, it seems you should try the 2 indexes `(product_id, aid, value)` AND `(aid, value, product_id)`. These are not the same as your index `(product_id,aid,value)` that you mentioned in a comment to Gordons answer when he proposed the 2nd one! So please add your explain output just to make sure neither you or we misunderstand something. Each of these indexes should improve your query about 50% (+/- x% depending on your data) compared to not having it. — Solarflare, Dec 12 '16 at 02:08
@Solarflare, thanks, playing with indexes you and Gordon said, now i have a better performance. — osyan, Dec 12 '16 at 23:24

score 1 · Accepted Answer · answered Dec 11 '16 at 19:54

1

First, the distinct in the subquery should be unnecessary. I'm not sure if MySQL optimizes it away. So, start with:

SELECT DISTINCT f.`value`
FROM `fields` f
WHERE f.aid = 8 AND
      f.product_id IN (SELECT f2.product_id
                       FROM `fields` f2
                       WHERE f2.aid = 6 AND f2.`value` = 3
                      );

For this query, you want an index on fields(aid, value, product_id).

In earlier versions of MySQL, it would be better to replace the IN subquery with EXISTS. If your query finishes in one second now, then you are probably on a more recent version.

answered Dec 11 '16 at 19:54

Gordon Linoff

1,242,037
58
646
786

thanks, about 7% better but i need something more optimized, if it could be – osyan Dec 11 '16 at 20:18
@osyan . . . You have exactly that 3-part index? – Gordon Linoff Dec 11 '16 at 22:55
yes i have `fields_product_id_aid_value_index` on `product_id,aid,value` and `fields_aid_foreign` on `aid` – osyan Dec 11 '16 at 23:00
for more: i also have an `id` column as primary key in `fields` – osyan Dec 11 '16 at 23:03
Gordon Linoff: Thanks Bro. – osyan Dec 12 '16 at 23:25

Matt · Answer 2 · 2016-12-11T20:49:36.020

0

Gordon showed you the IN method you may also want to consider EXISTS and or JOIN methods if you end up needing to tweak performance all 3 have different advantages and disadvantages depending on your data size and complexity.

EXISTS just uses a correlated sub query

SELECT f.*
FROM
    `fields` f
WHERE
    f.aid = 8
    EXISTS (
       SELECT 1
       FROM `fields` f2
       WHERE
          f2.aid = 6
          AND f2.`value` = 3
          AND f1.product_id = f2.product_id)

For the join method keeping distinct in this case would make since if 6 & 3 could be represented more than once.

SELECT f.*
FROM
    `fields` f
    INNER JOIN (
       SELECT DISTINCT `fields`.product_id FROM `fields` 
       WHERE aid = 6 AND `value` = 3
    ) t
    ON f.product_id = t.product_i
WHERE
    f.aid = 8

edited Dec 11 '16 at 20:49

answered Dec 11 '16 at 20:10

Matt

13,833
2
16
28

getting 1.6 AND 2.3 Seconds with this two methods – osyan Dec 11 '16 at 20:27
@osyan I saw your comments on Grodon's and here I guess I missed the part about optimized if IN and EXISTS are performing the exact same that would be interesting but it suggests that the subquery results are not tiny and not huge. http://stackoverflow.com/questions/14190788/subqueries-with-exists-vs-in-mysql But IN, EXISTS, JOIN are the 3 ways so if if you need to optimize it more than the 1.6 you could need indexes or any number of other things. do explain plan and post so others can comment otherwise we don't know what is going on in your environment. – Matt Dec 11 '16 at 20:46

score 0 · Answer 3 · answered Dec 11 '16 at 21:14

First, add the following index to your table if you don't already have one.

ALTER TABLE fields ADD KEY (aid, product_id, value);

By the way, when asking SQL questions, it would help if you post the output of SHOW CREATE TABLE so we can see if you have already defined any indexes or constraints.

sql> SHOW CREATE TABLE fields\G
       Table: fields
Create Table: CREATE TABLE `fields` (
  `product_id` int(11) DEFAULT NULL,
  `aid` int(11) DEFAULT NULL,
  `value` int(11) DEFAULT NULL,
  KEY `aid` (`aid`,`product_id`,`value`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Next, try this query:

SELECT DISTINCT f1.value
FROM fields AS f1
INNER JOIN fields AS f2
  ON f1.product_id=f2.product_id
WHERE f1.aid=8 AND f2.aid=6 AND f2.value=3;

This uses no subqueries, only indexed lookups. We don't care about reducing the set of matching rows in f2 because that will be taken care of by the DISTINCT anyway.

Output (tested on MySQL 8.0.0-dmr):

+-------+
| value |
+-------+
|     8 |
|    11 |
|    82 |
|    16 |
+-------+

Here's the EXPLAIN report to show the optimization:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: f2
   partitions: NULL
         type: ref
possible_keys: aid
          key: aid
      key_len: 5
          ref: const
         rows: 6
     filtered: 10.00
        Extra: Using where; Using index; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: f1
   partitions: NULL
         type: ref
possible_keys: aid
          key: aid
      key_len: 10
          ref: const,test.f2.product_id
         rows: 2
     filtered: 100.00
        Extra: Using index

Both tables get the "Using index" optimization, so they are using a covering index.

Both tables are using index lookups to narrow down the number of examined rows.

There's still a temporary table which causes some overhead. But this is unavoidable because of the DISTINCT. But at least it's just one temp table, instead of multiple temp tables due to using DISTINCT in a subquery. And the temp table should be small, since it only needs to store rows that have already matched.

thanks foe your recommendations. but the result will have an extra value of 13, coming from last row of fields table. but as last product don't have a row with `aid = 6 , value = 3` so result should not contains 13 — osyan, Dec 11 '16 at 22:18
i also have an index on `aid, product_id, value`. i just want to keep it simple for more readabilty — osyan, Dec 11 '16 at 22:23
I tested the query I posted above using the example data you shared. I did not get 13 in the result. If you got 13, then you must have tested either with different data or a different query. — Bill Karwin, Dec 11 '16 at 22:40
oh sorry my bad job on wrong sample data, you are right. but running query on 1000 iteration give same execution time about 1 seconds — osyan, Dec 11 '16 at 22:47
I'm guessing that's the fault of the `DISTINCT`. Temporary tables are hard to optimize on MySQL. — Bill Karwin, Dec 11 '16 at 22:52

optimize select rows based on previous select result

3 Answers3