4

I have a very simple table with a date (in days), equipment name and engine hours (cumulative) for that day in a SQL Server table. The raw data table shows there are gaps in the day dates. I need to fill the gaps and interpolate to provide hour values for those new rows. The "Desired result" table shows what the end product should look like.

My initial thinking is to create a "dates" table (recursive function) and use a left join to create the complete table, however filling the hour columns with interpolated data is beyond me at this stage. Any ideas?

Raw data

+------------+-----------+-------+--+--+
| Date       | Equipment | Hours |  |  |
+------------+-----------+-------+--+--+
| 2019/01/01 | EQ1       | 50    |  |  |
+------------+-----------+-------+--+--+
| 2019/01/02 | EQ1       | 67    |  |  |
+------------+-----------+-------+--+--+
| 2019/01/03 | EQ1       | 87    |  |  |
+------------+-----------+-------+--+--+
| 2019/01/04 | EQ1       | 105   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/07 | EQ1       | 150   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/08 | EQ1       | 169   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/09 | EQ1       | 187   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/12 | EQ1       | 247   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/13 | EQ1       | 265   |  |  |
+------------+-----------+-------+--+--+
|            |           |       |  |  |
+------------+-----------+-------+--+--+
| 2019/01/01 | EQ2       | 150   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/02 | EQ2       | 168   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/03 | EQ2       | 187   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/04 | EQ2       | 205   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/05 | EQ2       | 222   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/06 | EQ2       | 239   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/07 | EQ2       | 255   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/10 | EQ2       | 306   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/13 | EQ2       | 357   |  |  |
+------------+-----------+-------+--+--+

Desired result

+------------+-----------+-------+--+--+
| Date       | Equipment | Hours |  |  |
+------------+-----------+-------+--+--+
| 2019/01/01 | EQ1       | 50    |  |  |
+------------+-----------+-------+--+--+
| 2019/01/02 | EQ1       | 67    |  |  |
+------------+-----------+-------+--+--+
| 2019/01/03 | EQ1       | 87    |  |  |
+------------+-----------+-------+--+--+
| 2019/01/04 | EQ1       | 105   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/05 | EQ1       | 120   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/06 | EQ1       | 135   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/07 | EQ1       | 150   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/08 | EQ1       | 169   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/09 | EQ1       | 187   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/10 | EQ1       | 207   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/11 | EQ1       | 227   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/12 | EQ1       | 247   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/13 | EQ1       | 265   |  |  |
+------------+-----------+-------+--+--+
|            |           |       |  |  |
+------------+-----------+-------+--+--+
| 2019/01/01 | EQ2       | 150   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/02 | EQ2       | 168   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/03 | EQ2       | 187   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/04 | EQ2       | 205   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/05 | EQ2       | 222   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/06 | EQ2       | 239   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/07 | EQ2       | 255   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/08 | EQ2       | 272   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/09 | EQ2       | 289   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/10 | EQ2       | 306   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/11 | EQ2       | 323   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/12 | EQ2       | 340   |  |  |
+------------+-----------+-------+--+--+
| 2019/01/13 | EQ2       | 357   |  |  |
+------------+-----------+-------+--+--+
ceds
  • 2,097
  • 5
  • 32
  • 50
  • 1
    Your initial thinking is correct, and a calendar table is one way to go here. But also, you have a gaps and islands problem. The interpolated values would be the averages between the ends of each of the islands. – Tim Biegeleisen Jul 31 '19 at 14:21

3 Answers3

1

You can try this query.

DECLARE @SampleTable TABLE ( [Date] Date, Equipment VARCHAR(10),  Hours INT)
INSERT INTO @SampleTable VALUES
('2019/01/01','EQ1', 50 ),
('2019/01/02','EQ1', 67 ),
('2019/01/03','EQ1', 87 ),
('2019/01/04','EQ1', 105),
('2019/01/07','EQ1', 150),
('2019/01/08','EQ1', 169),
('2019/01/09','EQ1', 187),
('2019/01/12','EQ1', 247),
('2019/01/13','EQ1', 265),

('2019/01/01','EQ2', 150),
('2019/01/02','EQ2', 168),
('2019/01/03','EQ2', 187),
('2019/01/04','EQ2', 205),
('2019/01/05','EQ2', 222),
('2019/01/06','EQ2', 239),
('2019/01/07','EQ2', 255),
('2019/01/10','EQ2', 306),
('2019/01/13','EQ2', 357)


;WITH CTE AS (
    SELECT MIN([Date]) [Date], Equipment FROM @SampleTable T GROUP BY Equipment 
    UNION ALL
    SELECT DATEADD(DAY,1,CTE.[Date]),  CTE.Equipment FROM CTE 
        WHERE EXISTS( SELECT * FROM @SampleTable T WHERE T.Equipment = CTE.Equipment and DATEADD(DAY,1,CTE.[Date] ) <= T.[Date]  )
)
SELECT  CTE.[Date], CTE.Equipment, 
    X1.Hours +  
        DATEDIFF(DAY, X1.[Date],CTE.[Date]) * 
        CASE WHEN DATEDIFF(DAY, X1.[Date],X2.[Date]) > 0 
            THEN (X2.Hours - X1.Hours ) / DATEDIFF(DAY, X1.[Date], X2.[Date]) 
            ELSE X1.Hours END [Hours]
    FROM CTE
        OUTER APPLY( SELECT TOP 1 * FROM @SampleTable S1 WHERE S1.Equipment = CTE.Equipment and CTE.[Date]  >= S1.[Date] ORDER BY S1.Date DESC) X1
        OUTER APPLY( SELECT TOP 1 * FROM @SampleTable S1 WHERE S1.Equipment = CTE.Equipment and CTE.[Date]  <= S1.[Date] ORDER BY S1.Date ASC ) X2
ORDER BY CTE.Equipment, CTE.[Date]

Result:

Date       Equipment  Hours
---------- ---------- -----------
2019-01-01 EQ1        50
2019-01-02 EQ1        67
2019-01-03 EQ1        87
2019-01-04 EQ1        105
2019-01-05 EQ1        120
2019-01-06 EQ1        135
2019-01-07 EQ1        150
2019-01-08 EQ1        169
2019-01-09 EQ1        187
2019-01-10 EQ1        207
2019-01-11 EQ1        227
2019-01-12 EQ1        247
2019-01-13 EQ1        265

2019-01-01 EQ2        150
2019-01-02 EQ2        168
2019-01-03 EQ2        187
2019-01-04 EQ2        205
2019-01-05 EQ2        222
2019-01-06 EQ2        239
2019-01-07 EQ2        255
2019-01-08 EQ2        272
2019-01-09 EQ2        289
2019-01-10 EQ2        306
2019-01-11 EQ2        323
2019-01-12 EQ2        340
2019-01-13 EQ2        357
Serkan Arslan
  • 13,158
  • 4
  • 29
  • 44
0

Below is a prototype logic to solve your problem.

This logic assumes that you have a Dates table (could be a table variable, temp table etc.). You can find code online on how to create one (a simple way: How to create a Calendar table for 100 years in Sql)

-- 3. Final result: should return values only for missing days 
SELECT DT.Date, Filterred.Equipment,
    -- Logic: Hours value at the start of the gap + ( number of days between the start and "current" date * average hours change )
    FilterredGaps.[Hours] + DATEDIFF( DAY, FilterredGaps.[Date], DT.[Date] ) * AvgHoursChange
FROM
    -- 2. Filter out consecutive days and calculate Avg Hour Change
    ( SELECT *,
        -- Calculate avg daily change (if you have duplicate dates for a given Equipment, you may get devide by zero errors)
        (( NextHours - Hours ) / DATEDIFF( DAY, [Date], NextDate )) AS AvgHoursChange
    FROM
        -- 1. Find gaps
        ( SELECT *,
            -- Find next date and next hours value
            LEAD( [Date] ) OVER ( PARTITION BY Equipment ORDER BY [Date] ) AS NextDate,
            LEAD( [Hours] ) OVER ( PARTITION BY Equipment ORDER BY [Date] ) AS NextHours,
        FROM EquipmentTable ) AS Gaps
    -- Leave only gaps of more than 1 day
    WHERE DATEADD( DAY, 1, [Date] ) < NextDate ) AS FilterredGaps
        -- Finally join filterred gaps to the dates table to get only missing dates
        INNER JOIN DatesTable AS DT ON FilterredGaps.[Date] < DT.[Date] AND DT.[Date] < FilterredGaps.[Date]

Idea taken from: https://www.mssqltips.com/sqlservertutorial/9130/sql-server-window-functions-gaps-and-islands-problem/ I strongly recommend that you read the article to familiarise yourself with the problem and proposed solution.

Note: this code was not tested

Alex
  • 4,885
  • 3
  • 19
  • 39
0

I would create a View and use a table valued Crontab function based on https://github.com/atifaziz/NCrontab/wiki/SQL-Server-Crontab to produce sequences of date/times which then is LEFT OUTER JOIN-ed.

The calculated value needs the previous date (of the current equipment) that has a value, the next date (of the current equipment) that has a value, the date of the current row and the previous and next value. Which I would implement as 2 scalar valued functions (which may return NULL if they have no previous or next). One getting the previous/next date (parameters: @currentDate and a BIT @next (otherwise the previous is returned)) and one getting the previous/next number of hours (same parameters). The result could also be a combinated string of date and number of hours and then parsed - best measure what performes better. The next date logic returns this date if the current date has a value.

Then create a scalar valued function that takes these values and performs a calculation like this (verify that I didn't made any mistakes):

myGapInDays = NextDate - PreviousDate
myHourDiff = NextHours - PreviousHours
myIncrementPerDay (FLOAT) = myHourDiff / myGapInDays
myFactor = CurrentDate - PreviousDate
myResult = PreviousHours + Round(myFactor * myIncrementPerDay)

I hope that helps.

Christoph
  • 3,322
  • 2
  • 19
  • 28