0

This is a practice question from stratascratch and I'm literally stuck at the final HAVING statement.

Problem statement:

Find the total number of downloads for paying and non-paying users by date. Include only records where non-paying customers have more downloads than paying customers. The output should be sorted by earliest date first and contain 3 columns date, non-paying downloads, paying downloads.

There are three tables:

ms_user_dimension (user_id, acc_id)
ms_acc_dimension (acc_id, paying_customer)
ms_download_facts (date, user_id, downloads)

This is my code so far

SELECT date, 
    SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
    SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
    SELECT date, paying_customer, SUM(downloads) AS cnt
    FROM ms_download_facts d
    LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
    LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
    GROUP BY 1, 2
    ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING no > yes;

If I remove the HAVING no > yes at the end, the code will run and I can see I have three columns: date, yes, and no. However, if I add the HAVING statement, I get the error "column "no" does not exist...LINE 13: HAVING no > yes"

Can't figure out for the sake of my life what's going on here. Please let me know if anyone figures out something. TIA!

Tr3cky
  • 47
  • 3
  • 5
  • 5
    Which database is it? SQL Server, Oracle, MySQL? If it is SQL server, you cant use the computed column in the group by/havig, you may have to use the full expression that you used in the SELECT statement. – shahkalpesh Jan 03 '21 at 18:35
  • @shahkalpesh You're right on! The practice site doesn't specify what database engine unfortunately. Is this strictly a SQL server issue from your experience? – Tr3cky Jan 03 '21 at 18:41
  • @Kayx23: No, it isn't. This is the normal behavior. In standard SQL the `HAVING` clause gets evaluated before the `SELECT` clause and can hence not know the aliases. Some DBMS extend the standard here, though, and allow the aliases in the `GROUP BY` and `HAVING` clauses. See also https://stackoverflow.com/questions/3841295/sql-using-alias-in-group-by – Thorsten Kettner Jan 03 '21 at 19:49

2 Answers2

1

You don't need a subquery for this:

SELECT d.date, 
       SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) AS no,
       SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END) AS yes
FROM ms_download_facts d LEFT JOIN
     ms_user_dimension u
     ON d.user_id = u.user_id LEFT JOIN
     ms_acc_dimension a
     ON u.acc_id = a.acc_id
GROUP BY d.date
HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN d.downloads END) >  SUM(CASE WHEN a.paying_customer = 'yes' THEN d.downloads END);

You can simplify the HAVING clause to:

HAVING SUM(CASE WHEN a.paying_customer = 'no' THEN 1 ELSE -1 END) > 0

This version assumes that paying_customer only takes on the values 'yes' and 'no'.

You may be able to simplify the query further, depending on the database you are using.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

It doesn't like aliases in the having statement. Replace no with:

SUM(CASE WHEN paying_customer = 'no' THEN cnt END)

and do the similar thing for yes.

SELECT date, 
    SUM(CASE WHEN paying_customer = 'no' THEN cnt END) AS no,
    SUM(CASE WHEN paying_customer = 'yes' THEN cnt END) AS yes
FROM (
    SELECT date, paying_customer, SUM(downloads) AS cnt
    FROM ms_download_facts d
    LEFT JOIN ms_user_dimension u ON d.user_id = u.user_id
    LEFT JOIN ms_acc_dimension a ON u.acc_id = a.acc_id
    GROUP BY 1, 2
    ORDER BY 1, 2
) prePivot
GROUP BY date
HAVING SUM(CASE WHEN paying_customer = 'no' THEN cnt END) > SUM(CASE WHEN paying_customer = 'yes' THEN cnt END);
AndrewGraham
  • 310
  • 1
  • 8
  • Thank you so much! That has worked. Wow this is so verbose... What's the point of having aliases if we can't use them? Do you by chance know any workaround? – Tr3cky Jan 03 '21 at 18:39
  • @Kayx23: these column aliases will be used as column names, if you have some program fetching that SQL result set. Or you can use them as column names if you're referencing the `SELECT` as a subquery – marc_s Jan 03 '21 at 19:28
  • 2
    @Kayx23 A workaround would be to have everything except the HAVING clause in a derived table expression. Then you can reference the aliases. Basically `SELECT ... FROM (SELECT date, ...) AS t HAVING t.no > t.yes`. Aliases are assigned after `GROUP BY` and `HAVING` are processed, so that's why you can't access them. – siride Jan 03 '21 at 19:45