Why can HAVING reference COUNT(column_name) but not the column_name itself?

Question

I am trying to better understand how MySQL works. I came across a problem with subgroups. From question Unknown column in 'having clause', I understand why this code will return an error:

SELECT b.Title, b.Isbn
FROM Book AS b
INNER JOIN Writing AS w ON w.Book_id = b.ID
GROUP BY b.ID
HAVING w.Author_id = 1 AND b.Title LIKE "%Head%"

That error is: "Unknown column 'w.Author_id' in 'having clause'" because:

The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions. However, MySQL supports an extension to this behavior, and permits HAVING to refer to columns in the SELECT list and columns in outer subqueries as well.

But, if instead of w.Author_id = 1, I use COUNT(w.Author_id) > 1, the code exectues and works correctly:

SELECT b.Title, b.Isbn
FROM Book AS b
INNER JOIN Writing AS w ON w.Book_id = b.ID
GROUP BY b.ID
HAVING COUNT(w.Author_id) > 1 AND b.Title LIKE "%Head%"

So, my question is: what is it about COUNT() that makes w.Author_id accessible? I apologize if this is a silly/obvious question - I'm still a novice at SQL.

COUNT() doesn't make accessible, it makes logic behind in the syntax. w.Author_id=1 and count(w.Author_id)= 1 both conditions are totally different. If you want to add condition w.Author_id=1 then you can use in where clause. In Having a clause you should use a grouped field or aggregate function. — Prasanna Kumar J, May 29 '20 at 11:46

score 3 · Accepted Answer · answered May 29 '20 at 11:42

The documentation seems pretty clear on this subject. But let me see if I can explain it better for you.

The HAVING clause is essentially a WHERE clause that "takes place" after the GROUP BY. That is, the aggregation has already happened, so the data that is available is the aggregated data.

In your example, there is no Author_Id returned by the aggregation. And MySQL doesn't know how to generate one.

However, COUNT(w.Author_Id) is an aggregated result. MySQL can just add that (conceptually) to the results returned by the aggregation and filter on it.

Your query is equivalent to:

SELECT Title, Isbn
FROM (SELECT b.Title, b.Isbn, COUNT(*) as cnt
      FROM Book b JOIN
           Writing w
           ON w.Book_id = b.ID
      GROUP BY b.ID
     ) b
WHERE cnt > 1 AND b.Title LIKE '%Head%';

That said, the query is better written as:

SELECT b.Title, b.Isbn
FROM Book b JOIN
     Writing w
     ON w.Book_id = b.ID
WHERE b.Title LIKE '%Head%'
GROUP BY b.ID
HAVING COUNT(*) > 1;

You can filter on the title before aggregating and that is usually much more efficient.

score 1 · Answer 2 · answered May 29 '20 at 11:42

When you put w.Author_id in the expression COUNT(w.Author_id) > 1 you are satisfying the requirement for HAVING clause that

HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions

while the ID column alone won't, because it is not part of the GROUP BY clause nor is within an aggregate function (which COUNT is).

Why can HAVING reference COUNT(column_name) but not the column_name itself?

2 Answers2