1

I've seen in other questions that the difference between HAVING and WHERE in SQL is that HAVING is used post-aggregation whereas WHERE is used pre-aggregation. However, I am still unsure about when to use pre-aggregation filtering or post-aggregation filtering.

As a concrete example, why don't these two queries yield the same result (the second sums quantity prematurely in a way that squashes the GROUP BY call)?

Using WHERE to obtain number of condo sales of each real estate agent.

SELECT agentId, SUM(quantity) total_sales 
  FROM sales s, houses h
  WHERE s.houseId = h.houseId AND h.type = "condo"
  GROUP BY agentId
  ORDER BY total_sales;

Attempted use of HAVING to obtain the same quantity as above.

SELECT agentId, SUM(quantity) total_sales 
  FROM sales s, houses h
  GROUP BY agentId
  HAVING s.houseId = h.houseId AND h.type = "condo"
  ORDER BY total_sales;

Note: these were written/tested/executed in sqlite3.

bashfuloctopus
  • 348
  • 3
  • 13

3 Answers3

1

WHERE filters rows from the database. Then, if the query has aggregation, aggregation is ran based on the aggregate functions and GROUP BY clause in the query. After that point, HAVING is applied to filter the grouping results. The only filtering that HAVING allows is filtering on GROUP BY columns or calculated aggregates.

I must assume that you're using MySQL for your example query since, as other answers have noted, your HAVING clause doesn't make sense and MySQL has some default behaviors which are occasionally problematic and confusing.

Donnie
  • 45,732
  • 10
  • 64
  • 86
1

The simple way to think about it is to consider the order in which the steps are applied.

Step 1: Where clause filters data

Step 2: Group by is implemented (SUM / MAX / MIN / ETC)

Step 3: Having clause filters the results

So in your 2 examples:

SELECT agentId, SUM(quantity) total_sales 
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;

Step 1: Filter by HouseId and Condo

Step 2: Add up the results (number of houses that match the houseid and condo)

SELECT agentId, SUM(quantity) total_sales 
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;

Step 1: No Filter

Step 2: Add up quantity of all houses

Step 3: Filter the results by houseid and condo.

Hopefully this clears up what is happening.

The easiest way to decide which you should use is: - Use WHERE to filter the data - Use HAVING to filter the results of an aggregation (SUM / MAX / MIN / ETC)

Mathew Paxinos
  • 944
  • 7
  • 16
0

First, learn to use proper, explicit, standard JOIN syntax.

Second, your query should look like:

SELECT s.agentId, SUM(s.quantity) as total_sales 
FROM sales s JOIN
     houses h
     ON s.houseId = h.houseId
WHERE h.type = 'condo'
GROUP BY s.agentId
ORDER BY total_sales;

Your version of the query should generate an error in any reasonable database, because the HAVING clause has columns that are neither GROUP BY keys nor aggregation functions.

Additional notes:

  • The delimiter for a string is single quotes. If you use double quotes, things may not work as you expect.
  • You should qualify all column references, especially when your query references more than one table.
  • JOIN conditions belong in the ON clause, not in a WHERE clause.
  • Filtering on h.type after the aggregation makes no sense. If it did work, the sum() would include non-condos because the filtering is happening too late.
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 6
    You red-penned OP's example query, but made no attempt at all to answer the question. If the example query is so problematic as to make the question unclear, then tell OP so in a comment. But writing a lengthy response that teaches OP everything except what they actually asked about seems inappropriate to me. – mypetlion Nov 27 '18 at 22:43
  • Thank you, Gordon. This is helpful, but doesn't solve my confusion, so I'll try to re-phrase. What determines when `SUM(quantity)` is executed? Is it always executed after `WHERE ... GROUP BY ...` but before `HAVING ....`? – bashfuloctopus Nov 27 '18 at 22:46
  • For example, there exists [detailed explanation](https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-true-order-of-sql-operations/) of query operator precedence, but this particular twist of it remains unclear to me. – bashfuloctopus Nov 27 '18 at 22:48
  • 2
    @bashfuloctopus . . . Your query is simply syntactically incorrect SQL. That some database somewhere might accept it is possible, but you don't specify the database, so that is speculation. `JOIN` conditions should be in `ON` clauses, not in `WHERE` or `HAVING` clauses -- and that is half the problem with either version of the query. – Gordon Linoff Nov 27 '18 at 23:00
  • It makes sense to me now: if the joins are always explicit (not implicitly injected in the WHERE) then this issue never crops up. However, in sqlite3 both queries are valid in that they don't throw errors. Moreover, while poor style choice, it is possible to do the JOIN implicitly in WHERE. What is sqlite3 doing here? – bashfuloctopus Nov 27 '18 at 23:06
  • 1
    @bashfuloctopus . . . It is choosing arbitrary values from arbitrary rows for the unaggregated columns in `having`. Basically, it is doing nothing useful and it should conform to the standard and generate an error. – Gordon Linoff Nov 27 '18 at 23:09