0

As part of some solution building, I had to implement a view which is performing a running total (cumulative sum calculation). I took the most simple and basic approach of joining table on table with list of dates but it seems that the view is still fairly slow. Addition of indexes on the table didn't help much, even though the table itself have just 15K rows or so. I was wondering if someone could advice on what would be the right approach to speed it up?

There are several considerations:

  1. I need to calculate cumulative sum up to a date for specific ProjectID and ContractorID. So for same date I may have a lot of ProjectIDs and ContractorIds combinations, but combination of Date, ProjectID and ContractorID is always unique

  2. There is a master table with dates, projectids (but no contractorids) and I need a cumulative sum for each date, projectid in this master dates table

  3. I need to calculate a cumulative sum of several columns at the same time, not just of one

To walk you through the situation slightly more, the tables I have are:

  • dbo.Project_Reporting_Schedule which holds a master list of projectid, dates. For each of this combinations I need to calculate a cumulative sum based on another table. Please note it has no contractorid!

  • Project_value_delivery is a table where I have actual value columns to perform a cumulative sum calculation. It has its own set of dates which may or may not match dates in Project_Reporting_Schedule, hence we can't just join the table on itself. Please also note it has contractorid!

Currently I have the following select which is rather self-explanatory and just joins table with values on table with master date list and does the summation. Select works fine, but even with just 15K records it takes almost 5 seconds to run which is fairly slow.

select 
    pv2.ProjectID,
    pv2.ContractorID,
    pv1.Date, 
    sum(pv2.ValuePlanned) as PlannedCumulative, 
    sum(pv2.ValueActual) as ActualCumulative,
    sum(pv2.MobilizationPlanned) as MobilizationPlanned,
    sum(pv2.MobilizationActual) as MobilizationActual,
    sum(pv2.EngineeringPlanned) as EngineeringPlanned,
    sum(pv2.EngineeringActual) as EngineeringActual,
    sum(pv2.ProcurementPlanned) as ProcurementPlanned,
    sum(pv2.ProcurementActual) as ProcurementActual,
    sum(pv2.ConstructionPlanned) as ConstructionPlanned,
    sum(pv2.ConstructionActual) as ConstructionActual,
    sum(pv2.CommisioningTestingPlanned) as CommisioningTestingPlanned,
    sum(pv2.CommisioningTestingActual) as CommisioningTestingActual
from 
    dbo.Project_Reporting_Schedule as pv1
join 
    dbo.Project_value_delivery as pv2 on pv1.Date >= pv2.Date and pv1.ProjectID = pv2.ProjectID
group by 
    pv2.ProjectID, pv2.ContractorID, pv1.Date

UPDATE

For further clarifications, I put execution plan here: https://www.brentozar.com/pastetheplan/?id=H12t-O1PS

Indexes created are same and on both tables I have them for Projectid, Date combination as well as standalone indexes on ProjectID and Date columns.

All indexes are Unique Nonclustered where applicable or just Nonclustered where applicable.

We can see it does 'non-clustered index seek' which costs most of the execution. Maybe index needs to be adjusted?

Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70
  • You may have to define an index on pv1.Date, pv2.Date, ContractorId and ProjectID field if not done already – jidexl21 Sep 17 '19 at 18:28
  • Please post query plan (https://www.brentozar.com/pastetheplan/). Please also include details of the indexes you have created. – Alex Sep 17 '19 at 23:56
  • @Alex this is done – Maksim Khaitovich Sep 18 '19 at 08:52
  • Looking at the plan I can see a table scan on Project_value_delivery. I am not sure if adding indexes will help in this case as you are summing a lot of columns. Try using windowed functions (https://stackoverflow.com/a/13331102/6305294) to see if this will help. – Alex Sep 20 '19 at 05:33
  • @Alex thank you, windowed functions were indeed a way to solve it - they offered dramatic increase in speed. It wasn't very clear though how to apply them properly in my case but I figured it out – Maksim Khaitovich Oct 14 '19 at 09:07

2 Answers2

1

OK, so per suggestion from @Alex in the comments windowed functions are a way to go. The below code works lightning-fast compared to original code:

select 
       pv2.ProjectID,
       pv2.ContractorID,
       pv1.Date, 
       sum(pv2.ValuePlanned) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as PlannedCumulative, 
       sum(pv2.ValueActual) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as ActualCumulative,
       sum(pv2.MobilizationPlanned) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as MobilizationPlanned,
       sum(pv2.MobilizationActual) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as MobilizationActual,
       sum(pv2.EngineeringPlanned) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as EngineeringPlanned,
       sum(pv2.EngineeringActual) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as EngineeringActual,
       sum(pv2.ProcurementPlanned) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as ProcurementPlanned,
       sum(pv2.ProcurementActual) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as ProcurementActual,
       sum(pv2.ConstructionPlanned) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as ConstructionPlanned,
       sum(pv2.ConstructionActual) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as ConstructionActual,
       sum(pv2.CommisioningTestingPlanned) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as CommisioningTestingPlanned,
       sum(pv2.CommisioningTestingActual) over (partition by pv2.ProjectID, pv2.ContractorID order by pv1.Date ROWS between unbounded preceding and current row) as CommisioningTestingActual
from 
       dbo.Project_Reporting_Schedule as pv1
       join dbo.Project_value_delivery as pv2 on pv1.Date = pv2.Date and pv1.ProjectID = pv2.ProjectID
Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70
0

Take the comparison out of the JOIN clause and move it to a WHERE clause:

select 
       pv2.ProjectID,
       pv2.ContractorID,
       pv1.Date, 
       sum(pv2.ValuePlanned) as PlannedCumulative, 
       sum(pv2.ValueActual) as ActualCumulative,
       sum(pv2.MobilizationPlanned) as MobilizationPlanned,
       sum(pv2.MobilizationActual) as MobilizationActual,
       sum(pv2.EngineeringPlanned) as EngineeringPlanned,
       sum(pv2.EngineeringActual) as EngineeringActual,
       sum(pv2.ProcurementPlanned) as ProcurementPlanned,
       sum(pv2.ProcurementActual) as ProcurementActual,
       sum(pv2.ConstructionPlanned) as ConstructionPlanned,
       sum(pv2.ConstructionActual) as ConstructionActual,
       sum(pv2.CommisioningTestingPlanned) as CommisioningTestingPlanned,
       sum(pv2.CommisioningTestingActual) as CommisioningTestingActual
       FROM
       dbo.Project_Reporting_Schedule as pv1
       join dbo.Project_value_delivery as pv2 on pv1.ProjectID = pv2.ProjectID
       WHERE pv1.Date >= pv2.Date
       GROUP BY pv2.ProjectID, pv2.ContractorID, pv1.Date
daShier
  • 2,056
  • 2
  • 8
  • 14