0

I'm working with a table such as this one:

create table example_table (ID Number(8), Year Number(8), Data Varchar(4));
insert into example_table
(ID,Year,Data)
(
select 1,2008,1 from dual union
select 1,2010,2 from dual union
select 2,2014,3 from dual union
select 2,2020,4 from dual union
select 2,2009,5 from dual union
select 3,2003,6 from dual union
select 4,2015,7 from dual union
select 4,2014,8 from dual);
select * from example_table;
ID Year Data
1 2008 1
1 2010 2
2 2014 3
2 2020 4
2 2009 5
3 2003 6
4 2015 7
4 2014 8

In this case, together the ID and Year columns form a unique identifier for a specific row. I only want to pull rows where the Year is maximized for a given ID, such as

Id Year Data
1 2010 2
2 2020 4
3 2003 6
4 2015 7

I know I can do this with a correlated subquery such as

select distinct 
       ID, Year, Data
from   example_table a
where  Year = 
       (select max(year)
        from   example_table b
        where  a.id = b.id);

Or I can do it with a common table expression:

with tmp(ID,Year) 
as (
    select distinct 
           ID,
           max(year) 
           over (partition by ID)
    from example_table)
select distinct 
       ID, Year, Data
from   example_table
       inner join tmp 
         on  example_table.ID = tmp.ID 
         and example_table.year = tmp.year;

I could technically also do it by creating another table/view, but I don't have perms on my database to do that. Anyways, this is a common task I'm having to do within my scripts and I want to reduce the number of queries I generate. Is there any way to do this with only one query?

I tried using a HAVING statement like:

select example_table.ID,
       max(example_table.YEAR),
       example_table.DATA 
from example_table
group by ID, DATA
having example_table.YEAR = max(example_table.YEAR);

but that doesn't seem to work because HAVING statements only function on groups, and I want to manipulate elements within the group.

Is there a way to do this with only one query, and if not, what's the most clear way to do it with two queries?

MT0
  • 143,790
  • 11
  • 59
  • 117
Jakob Lovern
  • 1,301
  • 7
  • 24
  • 1
    `top n per group` logic requires sub-query or its logical equivalent. Have a look: https://blogs.oracle.com/sql/how-to-select-the-top-n-rows-per-group-with-sql-in-oracle-database – PM 77-1 Mar 30 '21 at 18:59

1 Answers1

1

Why not just use aggregation?

select id, max(year), 
       max(data) keep (dense_rank first order by year desc) as data
from t
group by id;

The keep syntax is Oracle's way of implementing a "first_value" aggregation function.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786