0

What is the difference between the two SQL queries below other than Query2 returning an additional field? Are there any possible scenarios where the output of the two queries would be different (other than the additional field in Query2)

Query1:

SELECT Field1, COUNT(*)
FROM Table1
GROUP BY Field1
HAVING COUNT(*) > 1

Query2:

SELECT Field1, Field2, COUNT(*)
FROM Table1
GROUP BY Field1, Field2
HAVING COUNT(*) > 1
slayernoah
  • 4,382
  • 11
  • 42
  • 73

2 Answers2

4

Absolutely, these are different. Query2's Group By clause specifies an extra field. That means when the results are aggregated, they will be aggregated for the combined unique values of Field1 AND Field2. That is, two records are aggregated if and only if both Field1 and Field2 are equal.

For example:

SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING Count(*) > 1

will return a list of professions with associated counts like:

Software Developer, 10
PM, 5
Tester, 2

whereas:

SELECT Profession, Gender, Count(*)
FROM People
GROUP BY Profession, Gender
HAVING Count(*) > 1

will return a list of professions broken out by gender like:

Software Developer, Male, 5
Sofware Developer, Female, 5
PM, Male, 3
PM, Female, 2
Tester, Male, 2

Edit with additional requested information:

You can retrieve counts of professions with rows for both genders via:

SELECT Profession, Count(*)
FROM People
GROUP BY Profession
HAVING SUM(case Gender when 'Female' then 1 else 0 end) > 0 AND SUM(case Gender when 'Male' then 1 else 0 end) > 0

It gets a bit hairy (need subqueries) if you also need associated gender counts

ChiralMichael
  • 1,214
  • 9
  • 20
  • Thanks a lot for taking the time to explain. You are awesome. Is there any way, i can return ONLY the rows that are broken down by Field2? Ex: In the last set of results in your answer, how would i return only the professions for which there are BOTH male and female data? (i.e. in the example, the row for Tester will not be returned since it only has data for Male) – slayernoah Feb 05 '16 at 22:36
  • Sure, I updated my answer with the additional information – ChiralMichael Feb 05 '16 at 22:53
  • Thanks! But is there any way we could do it without having to specify the possible values for Field2 in the code? In a table that i have, there are many possible values for Field2 – slayernoah Feb 05 '16 at 22:59
0

Extra group by clause in query 2 filters records.To know more look at below example.

test data:

id name 
1   a
2   b
3   a
4   a

So when I say group by name,sql first filters out distinct records for name which goes like below for the below query

select name,sum(id)
from test
group by name

--first filter out distinct values for group by column (here name)

a

b

--next for each distinct record ,how many values fall into that category..

a   1  a
    4  a
    3  a

b  2  b

So from the above groups ,now you can calculate any aggregations on the group in our case,it is sum,so next output will go some thing like this

a  8

b 2  

As you can see from above output,you also can calculate,any aggregation on group (here a and b values) ,like give me count(id),len(name) on group like below

select name,len(name),sum(id)
from test
group by name

The same thing happens when you group by another field,lets say like below

select id,name
from 
test
group by id,name

so in above case,sql first filters alldistinct records for id,name

1  a
2  b
3  a
4  a

next step is to get records which fall for each group

groupby columns  --columns which fall into this
1  a                1  a
2  b                2  b
3  a                3  a
4  a                4  a

Now you can calculate aggergations on above groups.hope this helps in visualizing your group by.further having will eliminate groups after group by phase,where will eliminate record before group by phase

TheGameiswar
  • 27,855
  • 8
  • 56
  • 94