0

First of all, even though this SQL: How do you select only groups that do not contain a certain value? thread is almost identical to my problem, it doesn't fully dissipate my confusion about the problem.

Let's have a table "Contacts" like this one:

+----------------------+
| Department FirstName |
+----------------------+
| 100 Thomas           |
| 200 Peter            |
| 100 Jerry            |
+----------------------+

First, I want to group the rows by the department number and show number of rows in each displayed group. This, I believe, can be easily done by the following query.

SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department

This outputs 2 groups. First with dep.no. 100 containing 2 rows, second with 200 containing only one row.

But then, I want to extend the query to exclude any group that doesn't contain certain value in certain column (e.g. Thomas in FirstName). Here are my questions:

1) Reading the above-mentioned thread I was able to come up with this, which seems to work correctly:

SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
WHERE Department IN (SELECT Department FROM Contacts WHERE FirstName = "Thomas")
GROUP BY Department

Q: How does this work? I understand the "WHERE Department IN" part, but then I'd expect a value, but instead another nested query is included, which to me doesn't make much sense as I'm only beginner with SQL.

2) By accident I was able to come up with another query that also seems to work, but feels weird and I also don't understand its workings.

SELECT Department, Count(*) As "Rows_in_group"
FROM Contacts
GROUP BY Department
HAVING NOT SUM(FirstName = "Thomas") = 0

Q: How does this work? Why alteration: HAVING SUM(FirstName = "Thomas") > 0 doesn't work?

3) Q: Is there any simple and correct way to do this using the HAVING clause?

I expected, that simple "HAVING FirstName='Thomas'" after the GROUP BY would do the trick as it seems to follow a common language, but it does not.

Note that I want the whole groups to be chosen by the query so "WHERE FirstName='Thomas'" isn't s solution for my problem as it excludes all the rows that don't satisfy the condition before the grouping takes place (at least the way I understand it).

Community
  • 1
  • 1
Demo
  • 394
  • 1
  • 4
  • 16

1 Answers1

0

Q: How does this work? I understand the "WHERE Department IN" part, but then I'd expect a value, but instead another nested query is included, which to me doesn't make much sense as I'm only beginner with SQL.

The nested query returns values which are used to match against Department

2) By accident I was able to come up with another query that also seems to work, but feels weird and I also don't understand its workings.

 HAVING NOT SUM(FirstName = "Thomas") = 0

"Feels weird" because, well, it is. This is not a place for the SUM function.

EDIT: Why does this work? The expression FirstName = "Thomas" gets evaluated as true or false (known as a Boolean expression). True numerically is equal to 1 and False converts to 0 (zero). By including SUM you then calculated the totals so really zero (still) means false and "not zero" is true. Then to make it weird(er) you included NOT which negated the whole thing and it becomes NOT TRUE = 0 or FALSE = FALSE (which is of course... TRUE)!!

EDIT: I think what could be more helpful to you is consideration of when to use WHERE and when to use HAVING (instead of the Boolean magic taking place). From this answer:

WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows.

WHERE was appropriate for your example because first you want to "only return rows WHERE Department IN (100)" and then you want to "group those rows by Department" and get a COUNT of how many rows had been selected.

Community
  • 1
  • 1
Morpheus
  • 1,616
  • 1
  • 21
  • 31
  • Thank you. You answered my first question, because I can understand now how the first case works (dumb me..). Let's say the second question is answered by saying that it's a nonsense syntax, but it kinda keeps me wondering why it actually works. That only leaves the 3rd question: isn't there a simple, correct and clean way to do the same thing using the HAVING clause? – Demo Oct 08 '15 at 19:14
  • 2
    You could also write the longer: `having sum(case when FirstName = 'Thomas' then 1 else 0 end) = 0` – shawnt00 Oct 08 '15 at 19:18
  • @shawnt00: that doesn't seem to work in my ms access environment (returns syntax error), but it might help in future study of SQL, so thank you anyway – Demo Oct 08 '15 at 19:56
  • @Morpheus: i'm trying to grasp your explanation why the second case works. Is it so, that the having clause checks against all the values within each group and returns 1(true) if at least one row of that group includes "Thomas" in FirstName column and 0(false) if not? Then Sum of 1 is 1 and NOT 1 = false and false=false is true as satisfying the HAVING condition? Then I don't get why "Having Sum(FirstName = 'Thomas') = 1" (omitting NOT and changing 0 to 1) doesn't work. Anyway thanks for your help and I'm selecting your answer as the best one yet. – Demo Oct 08 '15 at 20:04
  • @Demo, I thought it was MySQL and I realized Access after I posted. While I thought that Access did allow for `case` expressions I think the equivalent is `having sum(iif(FirstName = 'Thomas', 1, 0)` Can't remember if you need single or double quotes. – shawnt00 Oct 08 '15 at 20:08
  • 1
    @Demo I think that in Access boolean value true is treated as -1 and the false is treated as 0. Those are the values being summed. Try `= -1`. This is one reason it's better to be explicit. – shawnt00 Oct 08 '15 at 20:10
  • Thank you, @Demo. I've expanded the answer because it doesn't help as much to focus on `SUM(`*logical expression*`)`. It's more important to know when to use `HAVING` and when to use `WHERE`. @shawnt00 is correct about Access booleans though so it's best to use `= 0` to test for false and `<> 0` to test for true. – Morpheus Oct 08 '15 at 21:59
  • Agreed. Unfortunately this is not a place to discuss further, but I'm still missing a solution using HAVING (#3), which probably doesn't exist, but is not illogical imo. The WHERE solution (#1) makes sense but seems like a workaround. Why? WHERE prevents unwanted grouping, but HAVING applies restricting conditions on "existing" groupings. I want that. I find this more correct: SELECT Department, Count(*) FROM Contacts GROUP BY Department HAVING Department IN (the nested query), but it still lacks simplicity. I guess what I'm looking for would be something like: HAVING INCLUDES(column, value). – Demo Oct 09 '15 at 09:54