SQL Server: Greatest-N-per-group expanded

Question

My table looks like:

A     B     C     D
1     1     1     1
1     1     3     2
1     1     0     4
1     1     2     1     
1     2     1     0
1     2     0     2
1     2     4     5   
2     1     5     3

My goal is to, for each pair of A and B, output the value in D that corresponds to MIN(C), and the value in D that corresponds to MAX(C). The output should be

A    B    D at MIN(C)    D at MAX(C)
1    1    4              2
1    2    2              5
2    1    3              3

I know that to extract MIN(C) and MAX(C) I simply do:

SELECT A, B, MIN(C) as "minC", MAX(C) as "maxC"
FROM Table
GROUP BY A, B
ORDER BY A, B

My question is: how do I bring column D along for the ride? If I include it in the SELECT and GROUP BY clauses, it will generate the MIN(C) and MAX(C) for each D, which is not what I want. Moreover, I don't even need to output MIN(C) and MAX(C). D is all that I'm after.

The basic outline provided in SQL Select only rows with Max Value on a Column does not seem to handle this case.

Thanks in advance!

Mikael Eriksson · Accepted Answer · 2012-03-20T16:05:16.000

Your query could look something like this:

;with C as
(
  select A, B, C, D,
         row_number() over(partition by A, B order by C asc) as rn1,
         row_number() over(partition by A, B order by C desc) as rn2
  from YourTable
)
select C1.A, C1.B, C1.D as "D at MIN(C)", C2.D as "D at MAX(C)"
from C as C1
  inner join C as C2
    on C1.A = C2.A and
       C1.B = C2.B
where C1.rn1 = 1 and
      C2.rn2 = 1

The first part is a common table expression (CTE) that encapsulates a query that can be reused later in the main query. It uses row_number() to enumerate the rows within each partition. rn1 is ordered by C asc so rn1 = 1 for the min value of C and rn2 is ordered by C desc and that means that rn2 = 1 for the max value of C. The main query is using the CTE two times join on A and B columns. The where clause makes sure that we only get the rows where rn1 and rn2 is 1.

Here is a working example that uses a table variable @T instead of your table.

declare @T table
(
  A int,
  B int,
  C int,
  D int
)

insert into @T values
(1,     1,     1,     1),
(1,     1,     3,     2),
(1,     1,     0,     4),
(1,     1,     2,     1),   
(1,     2,     1,     0),
(1,     2,     0,     2),
(1,     2,     4,     5), 
(2,     1,     5,     3)

;with C as
(
  select A, B, C, D,
         row_number() over(partition by A, B order by C asc) as rn1,
         row_number() over(partition by A, B order by C desc) as rn2
  from @T
)
select C1.A, C1.B, C1.D as "D at MIN(C)", C2.D as "D at MAX(C)"
from C as C1
  inner join C as C2
    on C1.A = C2.A and
       C1.B = C2.B
where C1.rn1 = 1 and
      C2.rn2 = 1

I'm new to SQL- is there a way to do this without inserting the data into a new table? Can the "with" command be applied to an already existing table? The table I'm using for this is very large (and I believe immutable). Thanks for your help! — ddp26, Mar 20 '12 at 15:43
@ddp26 - I have updated the answer. I hope it will be clearer now. — Mikael Eriksson, Mar 20 '12 at 16:05

score 0 · Answer 2 · answered Mar 20 '12 at 14:35

0

Something like this might work:

SELECT A, B, MIN(C) as "minC", MAX(C) as "maxC", 
(SELECT TOP 1 D FROM Table [table2] WHERE table1.A = table2.A AND table1.B = table2.B AND MIN(table1.C) = table2.C) As [D]
FROM Table [table1]
GROUP BY A, B
ORDER BY A, B

answered Mar 20 '12 at 14:35

zzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

5,711
8
54
70

What do [table1] and [table2] refer to here? I have only the input table. – ddp26 Mar 20 '12 at 15:38
The both refer to your table "Table". Think of Table2 as a join that selects all the d values for the A,B,MIN(C) pairs. – zzzzzzzzzzzzzzzzzzzzzzzzzzzzzz Mar 20 '12 at 15:59

SQL Server: Greatest-N-per-group expanded

2 Answers2