Selecting fields that are not in GROUP BY

Question

I am working with group by in SQL Server. I want to select fields that shouldn't be part of group by.

I want to select DishId but it shouldn't be in group by clause. If I add the DishId into group by it repeats the range as shown following.

Output of following query is here

SELECT  v.range, COUNT(*) AS 'occurrences',COUNT(DishID) * SUM(OrderQty) AS [TotalOrders],po.DishID 
  FROM ( SELECT PgrId,CASE WHEN DATEDIFF(YEAR, p.PgrDOB, GETDATE()) >= 0  AND DATEDIFF(YEAR, p.PgrDOB, GETDATE()) < 10 THEN '0-9'   
                     WHEN DATEDIFF(YEAR, p.PgrDOB, GETDATE()) >= 10 AND DATEDIFF(YEAR, p.PgrDOB, GETDATE()) < 20 THEN '10-19'   
                     WHEN DATEDIFF(YEAR, p.PgrDOB, GETDATE()) >= 20 AND DATEDIFF(YEAR, p.PgrDOB, GETDATE()) < 30 THEN '20-29'   
                     WHEN DATEDIFF(YEAR, p.PgrDOB, GETDATE()) >= 30 AND DATEDIFF(YEAR, p.PgrDOB, GETDATE()) < 40 THEN '30-39'   
                     ELSE '40+' 
                END AS 'range'  
               FROM Passenger p 
              ) v inner join PassengerOrder po on v.PgrId  = po.PgrID   
 GROUP BY v.range,po.DishID

Is this your complete code, because it starts with from not select. also, I don't see any aggregation happening in your select, so you don't need a group by clause, if you are not performing any aggregation operation then use distinct in select, it will give you the grouped(unique) values. — Moulitharan M, Jan 28 '22 at 02:32
Is the range repeating because there are _different_ values of `DishID`? If you don't want a different row for each `range/DishID` combination, then you need to decide which `DishID` value you want. You can use `MIN` or MAX` or other things like `FIRST_VALUE` but you can't just add it to the `GROUP BY` if you don't want all the distinct values. Create a [db<>fiddle](https://sqlblog.org/fiddle) with some sample data and explain which `DishID` you want to show for `40+` and why. — Aaron Bertrand, Jan 28 '22 at 02:48
You can either select a single value: first, last, some other criteria. Or you can aggregate all the values using `STRING_AGG` — Charlieface, Jan 28 '22 at 03:01
here is the dbFiddle link https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=7d7eee56b6f242b159c0bcd5502060ef — Shahab khan, Jan 28 '22 at 03:29
I want to know which age group buys which dish the most. for example youngster like the dish0001 the most. old people like dish0003 the most etc — Shahab khan, Jan 28 '22 at 03:30
As per the question guide, please do not post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text. — Dale K, Jan 28 '22 at 04:37
Something like this basically https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=3d37659713c6d8db70d671fa832079f6. I must say, `COUNT(DishID) * SUM(OrderQty)` looks a bit suspect, firstly `COUNT(DishID)` is the same as `COUNT(*)`, also logically you would just need `SUM(OrderQty)` if you wanted total orders — Charlieface, Jan 28 '22 at 05:04

score 4 · Answer 1 · answered Jan 28 '22 at 03:59

Sounds like you want the highest dish in each age range, ordered by total sales descending. There are many other improvements that could be made here but the simplest way is to just generate a row number per range ordered by total sales descending, and wrap that in a CTE, filtered by the first row number.

;WITH cte AS
(
  SELECT v.range, 
   COUNT(*) AS occurrences,
   COUNT(DishID) * SUM(OrderQty) AS TotalOrders,
   po.DishID,
   rn = ROW_NUMBER() OVER (PARTITION BY v.range 
        ORDER BY COUNT(DishID) * SUM(OrderQty) DESC)
  FROM 
  ( SELECT PgrId,
    CASE WHEN p.PgrDOB < DATEADD(YEAR, -40, GETDATE()) THEN '40+'
         WHEN p.PgrDOB < DATEADD(YEAR, -30, GETDATE()) THEN '30-39'
         WHEN p.PgrDOB < DATEADD(YEAR, -20, GETDATE()) THEN '20-29'
         WHEN p.PgrDOB < DATEADD(YEAR, -10, GETDATE()) THEN '10-19'
         ELSE '0-9'
    END AS range
    FROM dbo.Passenger p    
 ) v inner join dbo.PassengerOrder po on v.PgrId  = po.PgrID    
 GROUP BY v.range,po.DishID 
)
SELECT range, occurrences, TotalOrders, DishID
  FROM cte
  WHERE rn = 1
  ORDER BY TotalOrders DESC;

Output (shown in this db<>fiddle):

range	occurrences	TotalOrders	DishID
40+	2	220	dsh0000001
30-39	2	84	dsh0000001
10-19	1	11	dsh0000001
20-29	1	1	dsh0000001
0-9	1	1	dsh0000001

I fixed a couple of other things, too:

Always use schema name for objects.
Don't use 'single quotes' to delimit column aliases; this makes them too easy to confuse with string literals. Use [square brackets] instead, but don't delimit at all unless you need to. (I talk a little about that here.)
Your DATEDIFF calculation was not accurate to a person's birthday, nor is mine, but if you flip the order you can make a much less complex CASE expression, and when you move functions away from the column, you make it more likely that the query could benefit from a current (or future) index. Lots more at Dating Responsibly.

Selecting fields that are not in GROUP BY

1 Answers1