22

I've the following table (my_data):

year |  X  |  Y
-----+-----+-----
2010 |  A  |  10
2011 |  A  |  20
2011 |  B  |  99
2009 |  C  |  30
2010 |  C  |  40


what is the best / smallest SQL statement to retrieve only the data related to the highest year and grouped by 'X' , like this:

year |  X  |  Y
-----+-----+-----
2011 |  A  |  20
2011 |  B  |  99
2010 |  C  |  40


Note that this result table will be used in a join.

Stef Heyenrath
  • 9,335
  • 12
  • 66
  • 121

11 Answers11

28
select year, x,y
from (
      select year, x, y, max(year) over(partition by x) max_year
      from my data
      )
where  year = max_year
schurik
  • 7,798
  • 2
  • 23
  • 29
14
select * from (
  select year, x, y, row_number() over (partition by x order by year desc ) rn 
  from my_data
) where rn = 1
Frank Schmitt
  • 30,195
  • 12
  • 73
  • 107
  • Thanks for this answer, it works correct. However I did accept the 'shurik' for the answer because it's a little bit more easy to understand. – Stef Heyenrath Jun 01 '11 at 08:53
4

You could also be portable and use an OUTER JOIN :

select t1.year, t1.x, t1.y
  from my_data t1
  left join my_data t2
    on t2.x = t1.x
   and t2.year > t1.year
 where t2.x is null
Benoit
  • 76,634
  • 23
  • 210
  • 236
4

It's a lot simpler than the other solutions:

SELECT x, max(year), MAX(y) KEEP (DENSE_RANK FIRST ORDER BY year DESC)
  FROM table
  GROUP BY x
Gary Myers
  • 34,963
  • 3
  • 49
  • 74
2
select year, x, y 
 from my_data stable 
where stable.year = (select max(year) 
                     from my_data tables 
                     where tables.x = stable.x);
Frank Schmitt
  • 30,195
  • 12
  • 73
  • 107
minglotus
  • 83
  • 6
2

You can use common table expression (CTE), works also with duplicated rows (if required) Execution plan is the same, more or less

;With my_data_cte as (
    SELECT [year], x,y,ROW_NUMBER() OVER (
        PARTITION BY x
        ORDER BY [year] desc) as rn
FROM [dbo].[my_data])
select [year], x,y from my_data_cte 
where rn = 1
Artyom
  • 21
  • 2
1

Gary Myers, your solution does not work, if, for example, for value A, year is smaller than 2010 and that year has maximum value. (FOR example, if row 2005,A,50 existed) In order to get correct solution, use the following. (which just swaps values)

SELECT x, max(y), MAX(year) KEEP (DENSE_RANK FIRST ORDER BY y DESC)
FROM test
GROUP BY x
alext
  • 678
  • 1
  • 11
  • 25
0
-- I had a slightly different case and just wandering why this one should't work 
SELECT my_data.x , my_data.y , my_data1.max_year 
FROM my_data
INNER JOIN 
( 
  SELECT x , max (year ) as max_year
  FROM my_data
  -- WHERE 1=1
  -- AND FILTER1=VALUE1
  GROUP BY my_data.x
) my_data1
ON ( my_data.x = my_data1.x )
Stevoisiak
  • 23,794
  • 27
  • 122
  • 225
Yordan Georgiev
  • 5,114
  • 1
  • 56
  • 53
  • Why `LEFT JOIN` rather than `INNER JOIN`? Why `WHERE 1=1`? Also using a join is less efficient than using `MAX(...) KEEP( DENSE_RANK ... )`. – MT0 Jul 23 '15 at 09:14
  • txn for the correction => fixed to inner join + added the where for dev user friendliness ... – Yordan Georgiev Jul 23 '15 at 10:48
0

You can select the most recent year for each X by using a subquery in your criteria:

select a.year, a.x, a.y
from my_data a
where
  a.year = (
    select max(a_yr.year) from my_data a_yr
    where a_yr.x = a.x);

Data:

year |  X  |  Y
-----+-----+-----
2010 |  A  |  10
2011 |  A  |  20
2011 |  B  |  99
2009 |  C  |  30
2010 |  C  |  40

Results:

year |  X  |  Y  
-----+-----+-----
2011 |  A  |  20 
2011 |  B  |  99 
2010 |  C  |  40 

Based on my limited testing, this method seems to be faster than using partition by.

Stevoisiak
  • 23,794
  • 27
  • 122
  • 225
-3

this can also be the solution

select greatest( (e),(g),(c),(a),(b) ) as latestdate from abc

  • Thank you for this code snippet, which may provide some immediate help. A proper explanation [would greatly improve](//meta.stackexchange.com/q/114762) its educational value by showing *why* this is a good solution to the problem, and would make it more useful to future readers with similar, but not identical, questions. Please [edit] your answer to add explanation, and give an indication of what limitations and assumptions apply. – Toby Speight Aug 02 '17 at 09:58
-4

The simplest is

Select * 
from table 
where year = (select max(year) from table)

It will possibly cause a table scan unless there is an index on year. But with an index should be performant

Karl
  • 3,312
  • 21
  • 27