1

Using SQL Server 2008 R2, I have a view, vwBASECustomerTransactions, with a list of transactions, which sum the aggregate data for each transaction type for each day:

Create View vwBASECustomerTransactions AS
Select CustomerID, 0 as TransType, Sum(Amount) as TransAmount, Cast(ChargeDate as Date) as TransDate
from Charge 
Group by CustomerID, Cast(ChargeDate as Date)

UNION ALL

Select CustomerID, 1, Sum(Amount), Cast(AdjustDate as Date)
from Adjustment
Group by CustomerID, Cast(AdjustDate as Date)

UNION ALL

Select CustomerID, 2, Sum(Amount), Cast(PaymentDate as Date)
from Payment
Group by CustomerID, Cast(PaymentDate as Date)

The result is:

CustomerID |  TransType  |  TransAmount  |  TransDate
-----------------------------------------------
120        |  0          |  100          |  1/1/2014
120        |  2          | -100          |  1/1/2014
120        |  0          |  50           |  17/2/2014

There are over 200,000 CustomerIDs and around 2 million Transactions.

I need to find the date where the Balance was the greatest for each CustomerID.

I have created a set of views with a balance calculated at each transaction date, which simply sums the transactions to that point in time, but it is very slow:

--Get the Net Daily Change
Create View vwBASECustomerNetBalChange as
Select CustomerID, TransDate, Sum(TransAmount) as Amount
from vwBASECustomerTransactions
Group by CustomerID, TransDate

--Get the Running Balance for any date
Create View vwRPTCustomerDailyBalance as
Select *, (Select Sum(Amount) from vwBASECustomerNetBalChange Where TransDate <= a.TransDate and CustomerID = a.CustomerID ) as Balance
from vwBASECustomerNetBalChange

--Get the Max Balance for any Customer (join back to get date)
--Takes > 10 minutes to run
Select CustomerID, Max(Balance) 
from vwRPTCustomerDailyBalance
group by CustomerID

Is there a better, more efficient way?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Molloch
  • 2,261
  • 4
  • 29
  • 48
  • IMHO,I think your queries are ok than using row_number or rank except "Get the Running Balance for any date" query.That query is bad. Also if you can avoid Cast(Col as Date) then it will be great.this way it can take advantage of index.secondly you can create index view.No doubts doing same stuff in proc will be faster . – KumarHarsh Jul 17 '14 at 04:55

4 Answers4

1

You can try this one to see if it runs any better that what you are seeing. After I finally figured out what it was exactly you were trying to do it is pretty much similar to your version :( The Union in the first subquery is going to basically select distinct values which will slow things down.

SELECT DistinctDays.CustomerID,DistinctDays.TransDate,TotalAmount=SUM(TotalAmount.TransAmount)
FROM
(
    Select CustomerID, Cast(ChargeDate as Date) as TransDate from Charge
    UNION 
    Select CustomerID, Cast(AdjustDate as Date) from Adjustment     
    UNION 
    Select CustomerID, Cast(PaymentDate as Date) from Payment

)AS DistinctDays
INNER JOIN
(
    Select CustomerID, Amount as TransAmount, ChargeDate as TransDate from Charge         
    UNION ALL
    Select CustomerID,  Amount, AdjustDate from Adjustment     
    UNION ALL
    Select CustomerID,Amount, PaymentDate from Payment
) 
AS TotalAmount ON TotalAmount.CustomerID=DistinctDays.CustomerID AND TotalAmount.TransDate<=DistinctDays.TransDate 
GROUP BY DistinctDays.CustomerID,DistinctDays.TransDate
ORDER BY DistinctDays.CustomerID,DistinctDays.TransDate
Ross Bush
  • 14,648
  • 2
  • 32
  • 55
  • This is extremely fast, but isn't it returning the largest transaction, rather than the largest daily balance? – Molloch Jul 17 '14 at 03:43
  • Thanks again, the 3rd query works really well, but it still give just the daily net change in amount (sum of all trans that day) and I need the Sum of ALL transactions to that day (ie, running balance per customer). :) – Molloch Jul 18 '14 at 02:36
  • Took a wrong stab the fist attempt. Does the query above run any faster? Similar to the timing you are seeing? – Ross Bush Jul 18 '14 at 22:15
  • Thank you for your help. This gives very similar results to my query, time wise. However I used your "balance on date" query for another version of the report - so thanks for that. :) – Molloch Jul 22 '14 at 07:07
0

This is too long for a comment.

Unfortunately, SQL Server 2008 doesn't directly support cumulative sum. I suspect, though, that there might be more efficient ways to get what you want. SQL Server 2012 does support cumulative sum.

I think the best approach would be to use a temporary table rather than a view. On the temporary table, you can add indexes for CustomerId, date, which should be a big help for the correlated subquery.

Another approach would be to do the cumulative sum at the table level, inside the view. This can take advantage of indexes at the table level. However, I think the temporary table approach is probably the best solution.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Will this give you the result ? its using direct table joins.

SELECT * FROM 
(
SELECT ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY BALANCE DESC) AS ID,*
FROM  
(SELECT A.CUSTOMERID,A.TRANSDATE,SUM(A.AMOUNT+B.AMOUNT+C.AMOUNT) AS BALANCE
FROM CHARGE  A
JOIN ADJUSTMENT B
ON A.CUSTOMERID=B.CUSTOMERID
AND CAST(A.TRANSDATE AS DATE)= CAST(A.TRANSDATE AS DATE)
JOIN PAYMENT C
ON A.CUSTOMERID=C.CUSTOMERID
AND CAST(A.TRANSDATE AS DATE)= CAST(C.TRANSDATE AS DATE)
GROUP BY A.CUSTOMERID,A.TRANSDATE)
)STAB
 WHERE ID=1

we can use where conditions if needed.Since i don't understand the running balance i skipped the where clause.

Recursive
  • 954
  • 7
  • 12
0

How to get cumulative sum

This was the solution I used.

set transaction isolation level read uncommitted /*prevents table locking*/

select top 1000 SUM(ad.TRN_RECPT_AMT+ad.TRN_DISC_TAKEN_AMT) OVER(PARTITION BY ad.INVC_ID ORDER BY ad.TIME_STAMP ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)[pay_td],
ah.INVC_AMT - 
SUM(ad.TRN_RECPT_AMT+ad.TRN_DISC_TAKEN_AMT) OVER(PARTITION BY ad.INVC_ID ORDER BY ad.TIME_STAMP ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)[bal_td],
ah.CUST_ID ,ah.PROJ_ID, ah.INVC_ID, ah.INVC_AMT,ad.INVC_AMT,
ad.TRN_DT,
ad.S_JNL_CD,
ad.RECPT_DT ,ad.TRN_RECPT_AMT,
ad.TRN_DISC_TAKEN_AMT, 
ad.TIME_STAMP

FROM DELTEKCP.DELTEK.AR_HDR_HS ah
join DELTEK.AR_DETL_HS AS ad on ah.INVC_ID = ad.INVC_ID
order by ah.CUST_ID,ah.INVC_ID,ad.TIME_STAMP desc
;