How do rank records that exceed a 5 minute interval T-SQL

Question

I have the following dataset:

Henry   2014-10-24 00:02:28.000 1.00
Henry   2014-10-24 00:07:28.000 2.00
Henry   2014-10-24 00:12:28.000 1.00
Henry   2014-10-24 00:17:28.000 1.00
Henry   2014-10-24 00:22:28.000 4.00
Henry   2014-10-24 00:27:28.000 1.00
Henry   2014-10-24 01:44:28.000 2.00
Henry   2014-10-24 01:49:28.000 1.00
Henry   2014-10-24 01:54:28.000 1.00

Rows 1 to 6 are at a 5 minute interval, and then from 7 to 9 they jump an hour. This could vary, and what I would like to do is detect when a dataset breaks the 5 minute interval. I am looking at DENSE_RANK and ROW_NUMBER but not sure if this is the right direction.

Any help would be appreciated.

Thank you.

Which version of `SQL Server` are you using -- 2012 by chance? — sgeddes, Nov 11 '14 at 15:06
What defines a data set? are all "Henry" records relevant in the data set vs perhaps all "Sally" records? and what are the column names in the table? breaking the data into two sets and doing a left join base set base set and a correlated sub query seems like it may work as well. However a windowed partition using lag and lead would likely be the most efficient. — xQbert, Nov 11 '14 at 15:10
What is the beginning of the "dataset" and what is the end? So how do you calcuate the timespan between them? Do you look to the last record or to the first of the last dataset? — Tim Schmelter, Nov 11 '14 at 15:11
http://blog.sqlauthority.com/2013/09/22/sql-server-how-to-access-the-previous-row-and-next-row-value-in-select-statement/ perhaps would help — xQbert, Nov 11 '14 at 15:15
I tried to simpify my question, maybe a little too much. But the columns are as follows: Target, PollTime, SampleValue. I am using CTE to join the data I need, then I created a RankData common table to apply my rank against the dataset above. And we are using SQL Server 2008 R2 — Pete Dragun, Nov 11 '14 at 15:24

score 1 · Accepted Answer · edited May 23 '17 at 11:49

Since you don't specify your table or column names, I'm going to assume you have a table Times with the following schema:

Name varchar(50)
RecordDate datetime
Amount decimal(8,6)

With that knowledge, you can use either a CROSS APPLY (similar to INNER JOIN) or OUTER APPLY (similar to a LEFT JOIN) to compare the table to itself while including some crafty filtering criteria:

SELECT *, DATEDIFF(minute, t2.PreviousRecordDate, t1.RecordDate) [Interval]
FROM Times t1
    CROSS APPLY  (
        SELECT TOP 1 RecordDate [PreviousRecordDate]
        FROM Times temp_times
        WHERE temp_times.Name = t1.Name
        AND temp_times.RecordDate < t1.RecordDate
        ORDER BY temp_times.RecordDate DESC
    )  t2
WHERE DATEDIFF(minute, t2.PreviousRecordDate, t1.RecordDate) > 5

This gives us the following result set:

Name     RecordDate                 Amount      PreviousRecordDate        Interval
Henry    2014-10-24 01:44:28.000    2.000000    2014-10-24 00:27:28.000   77

From there, you can modify the query to use any window functions or additional information you'd like.

Here's an example of this in action: http://www.sqlfiddle.com/#!6/c239b/3

If you are unfamiliar with APPLY and its uses compared to a regular JOIN, see the following resources:

cpacheco · Answer 2 · 2014-11-11T15:50:36.420

I'm not sure what exactly you need, but I think you may want to check out the LAG function, which lets you access a previous row in a table.

SELECT 
    name,
    dtime,
    DATEDIFF(MINUTE,LAG(dtime, 1, dtime) OVER (PARTITION BY name ORDER BY dtime), dtime) AS Diff
FROM #gg

This query will let you get the difference in minutes between each row and its previous row, then you could use it as a subquery to determine which row has a Diff > 5, or check if any row in the resultset has it. Suit it to your needs

EDIT: An Alternative method is to use a COMMON TABLE EXPRESSION combines with the ROW_NUMBER() function. This should run on SQL 2008, but I haven't tested it (it works on 2012)

;
WITH cte AS
(
    SELECT 
        name,
        dtime,
        ROW_NUMBER() OVER (ORDER BY dtime) AS rn
    FROM #gg
)
SELECT
    name,
    dtime,
    DATEDIFF(MINUTE, ISNULL((SELECT dtime FROM cte sub WHERE sub.rn = cte.rn - 1), dtime), dtime)
FROM cte

Unfortunately based on the link you provided about the LAG function it is not avaiable for SQL Server 2008 R2. Is their a viable solution for this version? — Pete Dragun, Nov 11 '14 at 15:34
I tried your suggestion and found it quite slow when applied to my real dataset. The CROSS APPLY solution works much faster. But I do thank you for your suggestion. — Pete Dragun, Nov 11 '14 at 16:44

How do rank records that exceed a 5 minute interval T-SQL

2 Answers2