optimizing mysql subquery execution time

Question

I have a following query:

select
qs.*
from
    (
        select
            tsk.id,
            tsk.request_id,
            tsk.hash_id
        from
            user_tasks as usr
            inner join unassigned_tasks as tsk on usr.task_id = tsk.id
        where
            usr.assigned_to = 53
            AND tsk.product_id NOT IN (
                SELECT
                    product_id
                FROM
                    product_progresses
                WHERE
                    request_id = tsk.request_id
            )
        ORDER BY
            tsk.id
    ) as qs <-- this takes about 233ms.
WHERE
    qs.id = ( <-- should this subquery execute for every row from outer result-set row?
        SELECT
            min(qt.id)
        FROM
            (
                select
                    tsk.id,
                    tsk.request_id,
                    tsk.hash_id
                from
                    user_tasks as usr
                    inner join unassigned_tasks as tsk on usr.task_id = tsk.id
                where
                    usr.assigned_to = 53
                    AND tsk.product_id NOT IN (
                        SELECT
                            product_id
                        FROM
                            product_progresses
                        WHERE
                            request_id = tsk.request_id
                    )
                ORDER BY
                    tsk.id
            ) as qt <-- this takes about 233ms.
        WHERE
            qt.id > 83934
    )

Both qs and qt takes about the same time to execute, i.e. around 233ms.

Also, executing this whole query takes about the same time.

I have a conception that inner query inside where qs.id = (...) executes once for every row in the result-set generated from qs.

In this current case, qs outputs 10 rows.

So I decided to test if the sub-query is executed for each row 10 times.

Here's what I put inside sub-query:

WHERE qs.id = (
    SELECT
        SLEEP(1)
)

instead of where qs.id = ( select min(qt.id) ... ).

This took about 10.246 s which proves that the inner query is being run for each row.

So with this, shouldn't the initial query take about (qt time) 233 * 10 (rows from qs) = 2330ms?

Is it because select min().. is calculated once only?

I'm trying to find documentation to support this, by in my experience MySQL tends to execute every subquery first. I.e. Because `qs` is subquery, it'll be fully evaluated, rather than limited by there `where` clause. But if you move the where clause inside `qs` it will limit the selection. — Andy N, Dec 16 '19 at 10:16

Andy N · Answer 1 · 2019-12-16T10:41:15.870

In MySQL subqueries within the FROM clause are evaluated before subqueries in the WHERE clause.

Useful article: https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/

In your case, this is causing qs to be fully evaluated before the WHERE clause is factored in. If you want qs to be limited by qt you should be able to achieve that by unnesting qs or moving qt inside qs.

In practice though, rather than use a subquery for the WHERE clause, it would be probably better to use joins.

See SQL select only rows with max value on a column

optimizing mysql subquery execution time

1 Answers1