I'm using a data analysis tool called Metabase and often compose "Native" questions (creating reports from pure MySQL).
To help keep my code organized and readable I might do something like:
with
subscribed_users as (
select user.id, user.name, user.email from users inner join subscribers on subscribers.user_id = users.id
),
non_trial_projects_created_since_2018 as (
select projects.name, projects.size, users.name, users.email
from projects
inner join subscribed_users on subscribed_users.id = projects.owner_id
where year(project.created_at) >= 2018
),
1_perhaps_another_alias_nesting_the_other_aliases as (...),
2_perhaps_another_alias_nesting_the_other_aliases as (...),
3_perhaps_another_alias_nesting_the_other_aliases as (...),
finally_my_results_query as (...),
results as (select * from finally_my_results_query)
select * from results
The results
alias is redundant but can be useful when attempting to debug/filter the results without modifying the original queries. It's also useful if I need to debug any of the aliased queries by using select * from alias
.
I noticed that when I bypass results
and call finally_my_results_query
directly, the query performance is noticeably improved.
Coming from an object-oriented programming background, I like assigning this stuff names and avoiding nested, ugly queries. What's actually going on when I implement WITH
that impacts performance and are there alternative ways to accomplish what I want other than nesting queries?