0

I've heard of cross apply, but have to date never had a time I thought I needed to use it. Below is a sample of something I've done more then a few times, and then the same using cross apply.

edit* Can someone explain what the difference (if any) there is between the join example and the cross apply example for this situation

    DROP TABLE IF EXISTS #test
/*creating test table and inserting test rows*/
create table #test (id char(1), val int, dt date)

        insert into #test values ('a',1,'2022/1/6')
        insert into #test values ('a',4,'2022/1/7')
        insert into #test values ('a',7,'2022/1/8')
        insert into #test values ('a',5,'2022/1/9')
        insert into #test values ('a',3,'2022/1/10')
        insert into #test values ('a',1,'2022/1/11')
        insert into #test values ('b',1,'2022/1/6')
        insert into #test values ('b',4,'2022/1/7')
        insert into #test values ('b',6,'2022/1/8')
        insert into #test values ('b',8,'2022/1/9')
        insert into #test values ('b',3,'2022/1/10')
        insert into #test values ('b',5,'2022/1/11')
        insert into #test values ('c',1,'2022/1/6')
        insert into #test values ('c',4,'2022/1/7')
        insert into #test values ('c',6,'2022/1/8')
        insert into #test values ('c',8,'2022/1/9')
        insert into #test values ('c',3,'2022/1/10')
        insert into #test values ('c',5,'2022/1/11')
     -- this is the way I've been accomplishing this task  
    SELECT t.id
        ,t.val
        ,t.dt
    FROM #test t
    JOIN (
        SELECT max(dt) max_dt
            ,id
        FROM #test
        GROUP BY id
        ) maxt ON maxt.max_dt = t.dt
        AND maxt.id = t.id
    ORDER BY t.id
    
    --this gives the same results but using cross apply
    select  t1.*
    from #test t1
    cross apply
    (
    select max(t2.dt) dt, id
    from #test t2 where t2.id = t1.id
    group by id
    ) cx
    where cx.dt = t1.dt
Just_Some_Guy
  • 330
  • 5
  • 24
  • Does this answer your question? [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group)? – Thom A Apr 14 '22 at 12:39
  • sort of. that's also a method I've done before, though I thought that was a 'sloppier way' of doing it. I could be wrong; I'll read through those discussions. for the sake of this I'm more interested in if there is a material difference between cross apply and join – Just_Some_Guy Apr 14 '22 at 12:50
  • `JOIN` and `CROSS APPLY` are similar yes; that isn't what your question asks though. – Thom A Apr 14 '22 at 13:02
  • fair. I edited the title and question to reflect my primary curiosity here. the link to the other question satisfys that part of my question – Just_Some_Guy Apr 14 '22 at 13:21
  • If you want to know what is different (from the data engine's perspective?) you would start by looking at the [plan](https://www.brentozar.com/pastetheplan/?id=SkYl-jrEq). – Thom A Apr 14 '22 at 13:27

1 Answers1

0

Below is the way I would solve this:

SELECT t.id, t.val, t.dt
FROM
(
  SELECT t.*, RN = ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.dt DESC)
  FROM   #test AS t
) AS t
WHERE t.RN = 1;

The way I determine the most efficient method is to review the execution plan(s). Lets compare the execution plans for two methods you posted, and to the TOP solution I just posted.

Execution plans: enter image description here

In this case there isn't any difference between apply and the inner join you are using. The will bot return the same data using identical execution plans. The cost of all three are even (relatively).In these plans, the majority of the work will be with the sort operator required to determine the MAX value.

Let's add a proper index for this query:

CREATE NONCLUSTERED INDEX nc_test1 ON #test(id, dt DESC) INCLUDE(val);

Now let's re-run the three queries:

enter image description here

With the index in place, the TOP approach is the best because it does not require a self-join, and therefore does less than 1/2 the work.

Alan Burstein
  • 7,770
  • 1
  • 15
  • 18