1

Our database is SQL Server 2014. I'll simplify my example to explain my problem.

TableA has columns:

CustomerID (primary key), LatestOrderedItem, LatestOrderDate

TableB has columns:

OrderID, CustomerID, OrderedItem, PurchaseDate

I want to update the LatestOrderedItem and LatestOrderDate of TableA with the latest/most recent order of a customer from TableB (based on purchase date).

What's the best / most efficient update statement for this? (In actuality, both tables have millions of records in our situation.)

Approaches I tried but are still very slow:

Approach 1:

UPDATE a
SET LatestOrderedItem = (SELECT TOP 1
                                OrderedItem
                         FROM TableB
                         WHERE CustomerID = a.CustomerID
                         ORDER BY PurchaseDate DESC),
    LatestOrderDate = (SELECT TOP 1
                              PurchaseDate
                       FROM TableB
                       WHERE CustomerID = a.CustomerID
                       ORDER BY PurchaseDate DESC)
FROM TableA a;

Approach 2:

UPDATE a
SET LatestOrderedItem = b.OrderedItem,
    LatestOrderDate = b.PurchaseDate
FROM TableA a
     INNER JOIN TableB b ON a.CustomerID = b.CustomerID
WHERE NOT EXISTS (SELECT 1
                  FROM TableB b2
                  WHERE b2.CustomerID = b.CustomerID
                    AND b2.PurchaseDate > b.PurchaseDate);
Thom A
  • 88,727
  • 11
  • 45
  • 75
Kristian
  • 11
  • 1
  • 7
    Personally, I wouldn't store the data in `TableA` anyway. If you need that information, then I would create a `VIEW`. Otherwise your data is going to need be updated every time `tableB` is updated. – Thom A Jan 24 '19 at 15:36
  • Thanks, @Larnu, for editing my first-ever post. Will learn how to format it better next time. – Kristian Jan 24 '19 at 15:51

3 Answers3

0

Try with a CROSS APPLY and a TOP.

UPDATE a SET 
    LatestOrderedItem = c.OrderedItem,
    LatestOrderDate = c.PurchaseDate
FROM 
    TableA a
    CROSS APPLY (
        SELECT TOP 1
            b.OrderedItem,
            b.PurchaseDate
        FROM
            TableB b
        WHERE
            a.CustomerID = b.CustomerID
        ORDER BY
            b.PurchaseDate DESC
    ) AS C

The following index is key to this update's performance. Can optionally include the OrderedItem column, or make the index clustered, although this has side effects.

CREATE NONCLUSTERED INDEX NCI_TableB_CustomerID_PurchaseDate ON TableB (CustomerID, PurchaseDate) -- INCLUDE (OrderedItem)

PD: Please read Larnu's comment, as keeping this data stored in a table is implying denormalization and is calling for trouble on the long run.

EzLo
  • 13,780
  • 10
  • 33
  • 38
0

Another option would be a window function in a derived table.

update a
    set a.LatestOrderedItem = b.OrderedItem
        ,a.LatestOrderDate = b.PurchaseDate
from TableA a
inner join
    (select CustomerID, OrderedItem, PurchaseDate, 
        row_number() over (partition by CustomerID order by PurchaseDate desc) as seq 
        from TableB) on b.CustomerID = a.CustomerID and b.seq = 1
S3S
  • 24,809
  • 5
  • 26
  • 45
0

Window function, one scan for both columns to be updated:

UPDATE a
SET a.LatestOrderedItem = b.OrderedItem,
    a.LatestOrderDate = b.PurchaseDate
FROM TableA a
    JOIN
    (
        SELECT CustomerID,
               OrderedItem,
               PurchaseDate,
               ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY PurchaseDate DESC) RN
        FROM TableB
        WHERE CustomerID = a.CustomerID
    ) b
        ON a.CustomerID = b.CustomerID
           AND b.RN = 1;
Alexander Volok
  • 5,630
  • 3
  • 17
  • 33