3

This is a sample of the data that I have.

-ID-        -Rank-      -Type-      -Status-    -Amount-
1142474     2       Under Offer Approved    23
1148492     1       Present     Current     56
1148492     2       Under Offer Approved    3
2273605     1       Present     Current     24

Where the ID is the same I only want the record with the highest rank. So the end result of the query.

-ID-        -Rank-      -Type-      -Status-    -Amount-
1142474     2       Under Offer Approved    23
1148492     1       Present     Current     56
2273605     1       Present     Current     24

Now to get the original data set is an expensive operation, so I don't want to do a group by the ID and then mins the rank and then joins back onto the dataset again. Hence the query needs to do its work another way.

Cheers Anthony

vdh_ant
  • 12,720
  • 13
  • 66
  • 86

5 Answers5

8

This will work:

with temp as (
select *, row_number() over (partition by id order by rank) as rownum
from table_name
)
select * from temp where rownum = 1

Will give one record per id where rank represents the least number

Rashmi Pandit
  • 23,230
  • 17
  • 71
  • 111
3
SELECT * FROM TheTable
WHERE 1 = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Rank DESC)
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • This was my first instinct. Maybe the OP will post some timing results. – mechanical_meat Jun 09 '09 at 06:02
  • Just for the record, in my situation this cases an error - "Windowed functions can only appear in the SELECT or ORDER BY clauses." So I have had to put the over part in a sub query and the where part in the outer query. – vdh_ant Jun 10 '09 at 01:56
1
select t1.id
       , t1.rank
       , t1.type
       , t1.status
       , t1.amount

from   my_table t1 

       left outer join my_table as t2 
       on t1.id = t2.id 
    and 
       t2.rank < t1.rank 

where  t2.id is null
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • @Anthony: the join to t2 is also known as a self-join. It is another copy of the same table. The reason it works is that we specify in the predicate (the join conditions and the WHERE clause) that we want to exclude everything but the top-ranked item for each id. – mechanical_meat Jun 09 '09 at 06:28
  • @adam: The problem is that to get the data out of my_table is very expensive (i.e. between 2 to 6 seconds) hence I would like to avoid joining onto the table again... – vdh_ant Jun 09 '09 at 06:52
  • What I have done (even though I really don't want to), I have put the results into a temp table and then joining the temp table onto itself... – vdh_ant Jun 09 '09 at 07:00
  • Well, that may be the way to go. Have a look at what another SO user has to say about decomposing your complicated queries into steps: http://stackoverflow.com/questions/754527/best-way-to-test-sql-queries/754570#754570 – mechanical_meat Jun 09 '09 at 07:04
  • Have you tried Alex's solution yet? http://stackoverflow.com/questions/968305/advanced-grouping-without-using-a-sub-query/968399#968399 – mechanical_meat Jun 09 '09 at 07:04
  • Yes but the performance difference was very little. But since it simplifies the query I might use it. – vdh_ant Jun 10 '09 at 01:53
0

Options available in general include:

  • Store the illustrated data in a temp table, then query the temp table.
  • Use a WITH clause to define the complex query, then have the DBMS sort out the query.

The WITH clause effectively allows you to give a name to a sub-query; the optimizer will avoid re-evaluating it if at all possible. The TEMP table solution is likely to be the simplest. And that will do GROUP BY of ID and MIN(rank) and join back.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0

Why is getting the data set so expensive, I see nothing terribly complex here. Do you have the indexes you need, is the query using them? Are the statistics out of date?

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • For the purposes of the question I have simplified the scenario. Basically the table is a Table_valued Function which unions results from 2 other Table_valued Functions each which use about 6 temp tables to build up the results. This is due to the level of normalization that is present in the database and how much data need to be derived to build up a picture of the data. Really this data should be captured in a Materialize View or something similar. But I can't make any change like this in this release cycle. cheers – vdh_ant Jun 10 '09 at 01:35