1

I'm trying to optimize the performance of the following update query:

UPDATE a
 SET a.[qty] =
      (
       SELECT MAX(b.[qty])
       FROM [TableA] AS b
       WHERE b.[ID] = a.[ID]
         AND b.[Date] = a.[Date]
         AND b.[qty] <> 0
      )
 FROM [TableA] a
 WHERE a.[qty] = 0
  AND a.[status] = 'New'

It deals with a large table with over 200m. rows.

I've already tried to create an index on [qty,status], but it was not really helpfull due to the index update at the end. Generally it is not so easy to create indexes on this table, cause there are a lot other update/insert-queries. So I'm think to reorganize this query somehow. Any ideas?

TableA is a heap like this:

CREATE TABLE TableA (
    ID            INTEGER       null,
    qty           INTEGER       null,
    date          date          null,
    status        VARCHAR(50)   null,
);

Execution plan: https://www.brentozar.com/pastetheplan/?id=S1KLUWO15

Notna
  • 33
  • 5
  • 4
    At a very minimum you should [include the actual Execution Plan](https://stackoverflow.com/a/7359705/1260204), you could use [Paste the Plan](https://www.brentozar.com/pastetheplan/) and share the link in your question. Also try to read it yourself, maybe you can figure out the performance issue(s) with your query. Finally include the [schema DDL](https://en.wikipedia.org/wiki/Data_definition_language) along with the query you are executing. Looking at the query plan, what is the most expensive step(s)? – Igor Feb 14 '22 at 16:16
  • Also I would normally assume that a column named `ID` would be unique and the primary key. If that assumption is true then this query makes no sense to me as you are just setting the qty to itself. Maybe that is part of the problem though? This illustrates again why a schema DDL is critical to helping you. – Igor Feb 14 '22 at 16:20
  • @Igor: no, it is heap, the ID field is not unique. According to the execution plan, the most costly operation is the update one (70%). – Notna Feb 14 '22 at 16:23
  • You could likely use an updatable CTE here, which would avoid 2 references to the table. – Thom A Feb 14 '22 at 16:27
  • A heap of 200m rows is going to be slow whatever you do, you really need to index it properly – Charlieface Feb 14 '22 at 21:47
  • The table has 353 million rows. It estimates 179 million match `qty = 0 and status = 'New'`. And estimates that 136.15 million match `qty <> 0` - and grouping those by `id,date` will only collapse things down to 135.96 million rows. How accurate are these estimates? You should post the **actual** plan not estimated. How frequently do you need to run this `UPDATE` query too? If just a one off then cost of creating indexes may not be worthwhile vs just running it but otherwise you probably should index it. – Martin Smith Feb 14 '22 at 22:12

1 Answers1

0

It's difficult to answer without seeing execution plans and table definitions, but you can avoid self-joining by using an updatable CTE/derived table with window functions

UPDATE a
SET
  qty = a.maxQty
FROM (
    SELECT *,
      MAX(CASE WHEN a.qty <> 0 THEN a.qty END) OVER (PARTITION BY a.ID, a.Date) AS maxQty
    FROM [TableA] a
) a
WHERE a.qty = 0
  AND a.status = 'New';

To support this query, you will need the following index

TableA (ID, Date) INCLUDE (qty, status)

The two key columns can be in either order, and if you do a clustered index then the INCLUDE columns are included automatically.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • Off the top of my head the first index that comes to mind is (qty, status). The second one would be (ID, Date, qty). – The Impaler Feb 14 '22 at 17:20
  • Not if you are doing the above window function, because the `qty` and `status` filters are on the outside, and will not use the index. In certain circumstances, self-joining (as in the question) would be better, and there you may want one of your indexes – Charlieface Feb 14 '22 at 21:46
  • I stand corrected. There's no filtering in the subquery, and the ordering should be optimized. – The Impaler Feb 14 '22 at 22:28
  • @Charlieface, the problem is, if I create an index containting or including the qty field, the update costs of that index are very high and exceed the benefits of using it within the CTE. – Notna Feb 14 '22 at 23:06