8

I have the following table in MSSQL2005

id | business_key | result
1 | 1 | 0
2 | 1 | 1
3 | 2 | 1
4 | 3 | 1
5 | 4 | 1
6 | 4 | 0

And now i want to group based on the business_key returning the complete entry with the highest id. So my expected result is:

business_key | result
1 | 1
2 | 1
3 | 1
4 | 0

I bet that there is a way to achieve that, i just can't see it at the moment.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
computhomas
  • 107
  • 1
  • 2
  • 6

5 Answers5

15

An alternative solution, which may give you better performance (test both ways and check the execution plans):

SELECT
     T1.id,
     T1.business_key,
     T1.result
FROM
     dbo.My_Table T1
LEFT OUTER JOIN dbo.My_Table T2 ON
     T2.business_key = T1.business_key AND
     T2.id > T1.id
WHERE
     T2.id IS NULL

This query assumes that the ID is a unique value (at least for any given business_key) and that it is set to NOT NULL.

Tom H
  • 46,766
  • 14
  • 87
  • 128
5
select
  drv.business_key,
  mytable.result
from mytable
  inner join
  (
    select 
      business_key, 
      max(id) as max_id
    from mytable
    group by
      business_key
  ) as drv on
    mytable.id = drv.max_id
jpbochi
  • 4,366
  • 3
  • 34
  • 43
Chris Simpson
  • 7,821
  • 10
  • 48
  • 68
3

Try this

select  business_key, 
        result
from    myTable
where   id in 
        (select max(id)
        from    myTable
        group by business_key)

EDIT: I created the table to test my code. I include it below in case anybody else wants to test it.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[myTable](
    [id] [int] NOT NULL,
    [business_key] [int] NOT NULL,
    [result] [int] NOT NULL
) ON [PRIMARY]
go

insert into myTable values(1,1,0);
insert into myTable values(2,1,1);
insert into myTable values(3,2,1);
insert into myTable values(4,3,1);
insert into myTable values(5,4,1);
insert into myTable values(6,4,0);

select  * from mytable
John MacIntyre
  • 12,910
  • 13
  • 67
  • 106
  • That query will only show one row of data, not four. You need to make the sub-query correlated - and you need aliases for the two instances of MyTable (call them 'first' and 'second'); add WHERE first.id = second.id – Jonathan Leffler Jan 13 '09 at 14:17
  • Jonathan-You are correct, there was a typo in the where clause. It should be 'in' not '='. But the alias's are not necessary. Thx for pointing out my error. – John MacIntyre Jan 13 '09 at 14:32
3
select business_key, 
       result
    from 
    (select id, 
        business_key, 
        result, 
        max(id) over (partition by business_key) as max_id
    from mytable) x
where id = max_id
  • This should be the accepted answer, because this query is MUCH more efficient than accepted one. – Łukasz Wiatrak Jan 17 '13 at 11:47
  • That's hardly the case. I ran both queries across MSSQL 2012 R2 in one go, and what I found from the resulting execution plan was that the subquery part used 68% of the elapsed time. The very partition used 77% of the entire second query. – Eric Wu Mar 06 '16 at 19:00
2

This is an older post but was relevant to something I was doing currently (2013). If you get a larger dataset (typical in most DBs), the performance of the various queries (looking at execution plans) says a lot. First we create a "TALLY table" to randomly generate numbers, then use an arbitrary formula to create data for the "MyTable":

CREATE TABLE #myTable(
    [id] [int] NOT NULL,
    [business_key] [int] NOT NULL,
    [result] [int] NOT NULL,
    PRIMARY KEY (Id)
) ON [PRIMARY];

; WITH
    -- Tally table Gen            Tally Rows:     X2                X3
t1 AS (SELECT 1 N UNION ALL SELECT 1 N),    -- 4            ,    8
t2 AS (SELECT 1 N FROM t1 x, t1 y),            -- 16            ,    64
t3 AS (SELECT 1 N FROM t2 x, t2 y),            -- 256            ,    4096
t4 AS (SELECT 1 N FROM t3 x, t3 y),            -- 65536        ,    16,777,216
t5 AS (SELECT 1 N FROM t4 x, t4 y),            -- 4,294,967,296,    A lot
Tally AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) N
          FROM t5 x, t5 y)

INSERT INTO #MyTable 
SELECT N, CAST(N/RAND(N/8) AS bigINT)/5 , N%2
FROM Tally
WHERE N < 500000

Next we run three different types of queries to review the performance (turn on "Actual Execution Plan" if you are using SQL Server Management Studio):

SET STATISTICS IO ON
SET STATISTICS TIME ON
----- Try #1 
select  'T1' AS Qry, id, business_key, 
        result
from    #myTable
where   id in 
        (select max(id)
        from    #myTable
        group by business_key)

---- Try #2 
select 'T2' AS Qry, id, business_key, 
       result
    from 
    (select id, 
        business_key, 
        result, 
        max(id) over (partition by business_key) as max_id
    from #mytable) x
where id = max_id

---- Try #3 
;with cteRowNumber as (
    select id, 
        business_key, 
        result,
           row_number() over(partition by business_key order by id desc) as RowNum
        from #mytable
)

SELECT 'T3' AS Qry, id, business_key, 
       result
FROM cteRowNumber
WHERE RowNum = 1

Cleanup:

IF OBJECT_ID(N'TempDB..#myTable',N'U') IS NOT NULL 
    DROP TABLE #myTable;
    SET STATISTICS IO OFF
SET STATISTICS TIME OFF

You will find, looking at the execution plans, "Try 1" has the best "Query Cost" and lowest CPU time but "Try 3" has the least reads and CPU time is not too bad. I would recommend using a CTE method for the least reads