What kind of performance impact can I expect with MySQL WITH statements?

Question

I'm using a data analysis tool called Metabase and often compose "Native" questions (creating reports from pure MySQL).

To help keep my code organized and readable I might do something like:

with 
    subscribed_users as (
        select user.id, user.name, user.email from users inner join subscribers on subscribers.user_id = users.id
    ),
    non_trial_projects_created_since_2018 as (
        select projects.name, projects.size, users.name, users.email
        from projects
        inner join subscribed_users on subscribed_users.id = projects.owner_id
        where year(project.created_at) >= 2018   
    ),
    1_perhaps_another_alias_nesting_the_other_aliases as (...),
    2_perhaps_another_alias_nesting_the_other_aliases as (...),
    3_perhaps_another_alias_nesting_the_other_aliases as (...),
    finally_my_results_query as (...),
    results as (select * from finally_my_results_query)

select * from results

The results alias is redundant but can be useful when attempting to debug/filter the results without modifying the original queries. It's also useful if I need to debug any of the aliased queries by using select * from alias.

I noticed that when I bypass results and call finally_my_results_query directly, the query performance is noticeably improved.

Coming from an object-oriented programming background, I like assigning this stuff names and avoiding nested, ugly queries. What's actually going on when I implement WITH that impacts performance and are there alternative ways to accomplish what I want other than nesting queries?

score 0 · Answer 1 · answered Nov 25 '20 at 16:53

Around 2016 the MySQL Optimizer team made a presentation at a conference about the new features they were developing for MySQL 8.0. They got to the common table expression (the WITH statement), which allows recursive queries, among other features.

This was an impressive game-changer for MySQL. MySQL was late to the game implementing recursive queries (see my answer to MySQL "WITH" clause), so this was going to put MySQL in the modern era with respect to support for standard SQL.

I asked the presenter "Did you implement common table expressions using temporary tables?" The manager of the MySQL Optimizer team quietly nodded her head.

The implications are that the performance of temporary tables is known to be a performance problem for MySQL. Many queries that use GROUP BY or UNION already create temporary tables, and you can see this when they report "Using temporary" in the EXPLAIN report for the query.

It's a lot of work for MySQL to build a temporary table, fill it with rows, and then read those rows. It's all code internal to MySQL that does that, using the same storage engine interface that any other non-temporary table does. Even though temporary tables are often small enough to be stored in RAM without needing to use the disk, it costs some amount of overhead.

As far as I know, each common table expression needs to store its results in a temporary table. Then subsequent CTE's or the final SELECT statement after the CTE's can query rows from that temporary table. The CTE's temporary table can even be referenced multiple times in your query.

But how much will this affect your query? It depends on a lot of factors.

How much data is the CTE processing
How complex each CTE is
How performant is your server

It may not be enough overhead to be a problem. The query may be costly in other ways, so the relative overhead of the temporary tables compared to the query itself is small. But it's certainly greater than zero overhead.

Also, the tradeoff as with many things is between an abstract and industry-standard query language that you as the developer can use to stay productive, versus writing optimal code that is much harder to develop and harder to maintain. Sometimes using the solution that is not quite optimal when you run it gives you something back, in that you can save a lot of time.

Your employer likes that because you can get the project done on schedule.

What kind of performance impact can I expect with MySQL WITH statements?

1 Answers1