0

I’m trying to work out when a CROSS APPLY works out better, and I have to say it’s yet to convince me.

I have found one use case involving the most recent sales per customer. I can see that it is certainly slick and it enables me to get more than one sale per customer:

FROM #customers AS c CROSS APPLY (
    SELECT *
    FROM #sales AS s
    WHERE s.customerid=c.id
    ORDER BY s.date DESC OFFSET 0 ROWS FETCH FIRST 1 ROW ONLY
) AS s;

(I have fetched only one row for comparison purposes).

Using a CTE and multiple joins, I can get a similar result using:

WITH CTE(customer,last_order) AS (
    SELECT customerid, max(date)
    FROM #sales
    GROUP BY customerid
)
SELECT
    *
FROM
    #sales AS s
    JOIN cte ON s.customerid=cte.customer AND s.date=cte.last_order
    JOIN customers AS c ON s.customerid=c.id
;

which looks messier, and only gives me one sale per customer.

However, I find that the latter is much lest costly than the former. In my sample data, the CROSS APPLY costs 74% while the CTE with joins costs 26%. With a larger set of data I get 98% vs 2%.

I have set up a fiddle at https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=18b7cf86195a56552bacd7e985da898c .

Is there something wrong with the way I did my CROSS APPLY, or is it inherently more expensive?

Manngo
  • 14,066
  • 10
  • 88
  • 110
  • 2
    SQL doesn't work like that. You build a query *telling* the engine the results you want to receive, *not* how to obtain them. It works out the best way to obtain them - which depends on a whole bunch of factors (indexes, statistics etc). So a given construct might perform better with one dataset and not with another. You'll only know when you try it. – Dale K May 04 '22 at 07:43
  • @DaleK I know that SQL code is a suggestion only. The costs come from the Estimated Execution Plan, and I did try with a different set of data. – Manngo May 04 '22 at 07:45
  • 2
    My point is there isn't a rule as to when a given construct will perform better, it totally depends on the data. – Dale K May 04 '22 at 07:46
  • 4
    The typical advice when it comes to SQL, is write your query the way that makes most logical sense to you, and only worry about performance if it becomes an issue. – Dale K May 04 '22 at 07:47
  • 3
    Your example is not really representative of the real world and, as pointed out, does not take into account a whole host of factors. With a suitable index on CustomerID, date, both queries yield a 50% estimated execution plan and I would suggest with a larger data set cross-apply would be better, and as a SQL tool it provides more flexability than a standard join. – Stu May 04 '22 at 08:01
  • A better example than the CTE would be a row-number solution, which could get one row per customer. See eg https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group – Charlieface May 04 '22 at 16:51

0 Answers0