3

Important: I need this to be SQL Server 2000 compatible.

I need a cumulative sum but based on current date field, an update query like the one I’m using now works but it’s terribly slow (ugly implicit RBAR triangular join):

UPDATE #RPT 
SET DailySumAccum = 
      (SELECT SUM(COALESCE(CTACTE2.Amount,0))
       FROM #RPT_CTACTE CTACTE2
       WHERE CTACTE2.IsAvailable = #RPT_CTACTE.IsAvailable 
         AND CTACTE2.CodCustomer = #RPT_CTACTE.CodCustomer  
         AND CTACTE2.ItemType = #RPT_CTACTE.ItemType  
         AND CTACTE2.CodItem = #RPT_CTACTE.CodItem 
         AND EsCtaCorrienteMon = -1 
         AND DATEDIFF(day, CTACTE2.OrderDate, #RPT_CTACTE.OrderDate) >= 0) 
WHERE #RPT_CTACTE.EsSaldoAnterior = 0
  AND EsCtaCorrienteMon = -1

I’ve also tried using the cursor method for running totals, but I can’t fix the RBAR problem with it, see the date comparison is still there and it’s still terribly slow:

-- this is inside the cursor

IF (@EsCtaCorrienteMon = -1)
BEGIN
    SELECT @NetoDiarioAcum = SUM(COALESCE(Amount,0))
    FROM #RPT_CTACTE
    WHERE IsAvailable = @IsAvailable
          AND CodCustomer = @CodCustomer
          AND ItemType = @ItemType
          AND CodItem = @CodItem
          AND EsCtaCorrienteMon = -1
          AND DATEDIFF(day, OrderDate, @OrderDate) >= 0
END IF

So, the problem with this cumulative sum is that date comparison there, it’s making the query super slow (I stopped the code after running for 10 minutes, commenting this Update the cursor takes only 3 seconds to run), the row count I’ve tested are 28K, it looks the time taken is going up exponentially when the record count increases, that’s why I assume the problem is there is a RBAR going on here (meaning "Row-By-Agonizing-Row").

EDIT: After some testing it looks the date is not the only problem, is there any way to make this Running total just by summing a variable inside a cursor or something like that?

EDIT2: I’m currently looking for a way to make what the first update does (a day running total based on 4 fields and date) but faster, I’ve been close to it yesterday adding the logic manually inside a cursor but the cursor went too big and hard to mantain, so what running total technique is the best (faster) for this case? And how would you implement that technique here? It’s not a simple running total, cuts should be made when any of those fields change. The field EsSaldoAnterior change can only be 0 or -1, the update only affect the outer table when this field is 0 but the inner running totals sum even when this field is -1. EsSaldoAnterior means something like "Its Previous Amount" and it means the running totals shounldn’t start at zero, they should start summing these amounts first (when they exist), this is the order if using ORDER BY:

ORDER BY IsAvailable, CodCustomer, ItemType, CodItem, OrderDate, EsSaldoAnterior
Palec
  • 12,743
  • 8
  • 69
  • 138
Jcis
  • 153
  • 2
  • 15
  • What does the query plan say? – paul Dec 02 '13 at 15:13
  • It's not really easy to get the plan because this temp table is created at runtime for each execution, i can try to make it a real table then execute the plan, but: you can be sure this is a regular temp table with no index, so we can assume the worse, do you think that creating indexes in the temp table (at runtime before running this code) will make things faster? – Jcis Dec 02 '13 at 15:20
  • Anybody trying to help please read EDIT2 in 1st post (at bottom) – Jcis Dec 05 '13 at 13:49
  • 1
    Take a look at [Best approaches for running totals](http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals) by Aaron Bertrand. – Nikola Markovinović Dec 05 '13 at 14:09
  • 1
    Don't use datediff to compare dates, because this will invalidate indexes you might have on them. Use simple comparison: `OrderDate < @OrderDate`. – Nikola Markovinović Dec 05 '13 at 14:18

2 Answers2

3

Try to change

AND DATEDIFF(day, CTACTE2.OrderDate, #RPT_CTACTE.OrderDate) >= 0) 

to

CTACTE2.OrderDate<DATEADD(dd, 0, DATEDIFF(dd, 0, #RPT_CTACTE.OrderDate)+1)

or

CTACTE2.OrderDate<DATEADD(dd, 1, DATEDIFF(dd, 0, #RPT_CTACTE.OrderDate))

In this case subquery should use index on CTACTE2.OrderDate

valex
  • 23,966
  • 7
  • 43
  • 60
  • Well, actually the temp table doesn't have any index, but i'll create one for that field and try this, thanks. – Jcis Dec 02 '13 at 15:42
  • It's slow, and it looks like my entire update is slow, not only the date comparison part, is there any way to sum using a variable inside a cursor? i've done this before but the problem is how to take the current date into account. – Jcis Dec 02 '13 at 16:05
  • I was really hoping that somebody could show the running total code for this but i think it was too hard to do it manually (like it's done inside cursors, summing using variables). But i'll award this answer because it was helpful and informative for me anyway, Thanks valex. – Jcis Dec 12 '13 at 22:09
1

To make the inner subselect fast you need to add an index on the columns used to relate the inner query to the temptable:

CodCustomer
ItemType
CodItem
EsCtaCorrienteMon
IsAvailable
OrderDate

Add the columns with the most different values (i.e. with high selectivity) first - do not have IsAvailable first since it only has two different values.

Keep the OrderDate column last in the index to enable fast range searches within the group to make the date comparison efficient. If there are few dates per group then this comparison should not be a problem.

Try making this index clustered so that the subselect finds its rows close to the outer temptable row.

Daniel B
  • 797
  • 4
  • 13