2

I'd like to write a query that will calculate the total amount of activity that occurred within each 15 minute interval of the day using only timestamps that correspond to activity start and stop times.

Here is a sample data set:

    DATE    StartDateTime   StopDateTime
2/2/2015    2/2/2015 7:00   2/2/2015 7:25
2/2/2015    2/2/2015 7:20   2/2/2015 7:29
2/2/2015    2/2/2015 7:35   2/2/2015 7:42
2/2/2015    2/2/2015 8:05   2/2/2015 8:14
2/2/2015    2/2/2015 8:16   2/2/2015 8:20
2/2/2015    2/2/2015 8:29   2/2/2015 8:40
2/2/2015    2/2/2015 8:55   2/2/2015 9:25

And this is what I'd like to be able to get:

    DATE    Interval       activityTime(min)
2/2/2015    2/2/2015 7:00   15
2/2/2015    2/2/2015 7:15   19
2/2/2015    2/2/2015 7:30   7
2/2/2015    2/2/2015 7:45   0
2/2/2015    2/2/2015 8:00   9
2/2/2015    2/2/2015 8:15   5
2/2/2015    2/2/2015 8:30   10
2/2/2015    2/2/2015 8:45   5
2/2/2015    2/2/2015 9:00   15
2/2/2015    2/2/2015 9:15   10

I've searched to find a way to organize the data in the way that I need and this is the closest that I've been able to find so far though I haven't been able to get it to work:

Splitting time + duration into intervals in t-sql

I'm pretty new to SQL so any explanation of solutions would be much appreciated. This is also my first post on stackoverflow so please let me know if the data are not in the preferred format or if there any additional questions. Thanks!

Community
  • 1
  • 1
William
  • 166
  • 10
  • It is helpful to tag database questions with both the appropriate software (MySQL, Oracle, DB2, ...) and version, e.g. `sql-server-2014`. Differences in syntax and features often affect the answers. – HABO Apr 08 '15 at 23:38
  • Thank you. I've updated it to include teradata sql assistant and a sql version. – William Apr 09 '15 at 00:30
  • 1
    Do you need this for both Teradata and SQL Server? What's your Teradata release? – dnoeth Apr 09 '15 at 08:43
  • I meant to put the 14th release of teradata, not sql server. We may be transitioning to SQL Server soon but for now, teradata is what I'm using – William Apr 09 '15 at 15:03

2 Answers2

2

Assuming some reasonable recent version of SQL Server, this ought to be a good start:

-- Some sample data.
declare @Samples as Table ( SampleId Int Identity, Start DateTime, Stop DateTime );
insert into @Samples ( Start, Stop ) values
  ( '2/2/2015 7:00', '2/2/2015 7:25' ),
  ( '2/2/2015 7:20', '2/2/2015 7:29' ),
  ( '2/2/2015 7:35', '2/2/2015 7:42' ),
  ( '2/2/2015 8:05', '2/2/2015 8:14' ),
  ( '2/2/2015 8:16', '2/2/2015 8:20' ),
  ( '2/2/2015 8:29', '2/2/2015 8:40' ),
  ( '2/2/2015 8:55', '2/2/2015 9:25' );
select * from @Samples;

-- Find the limits and align them to quarter hours.
declare @Min as DateTime;
declare @Max as DateTime;
select @Min = min( Start ), @Max = max( Stop )
  from @Samples;
set @Min = DateAdd( minute, -DatePart( minute, @Min ) % 15, @Min );
set @Max = DateAdd( minute, 15 - DatePart( minute, @Max ) % 15, @Max );
select @Min as [Min], @Max as [Max];


-- Go for it.
with QuarterHours ( QuarterStart, QuarterStop )
  as (
    select @Min, DateAdd( minute, 15, @Min )
    union all
    select QuarterStop, DateAdd( minute, 15, QuarterStop )
      from QuarterHours
      where QuarterStop < @Max ),
  Overlaps
  as ( select QH.QuarterStart, QH.QuarterStop, S.Start, S.Stop,
  case
    when S.Start <= QH.QuarterStart and S.Stop >= QH.QuarterStop then 15
    when S.Start <= QH.QuarterStart and S.Stop < QH.QuarterStop then DateDiff( minute, QH.QuarterStart, S.Stop )
    when S.Start > QH.QuarterStart and S.Stop >= QH.QuarterStop then DateDiff( minute, S.Start, QH.QuarterStop )
    when S.Start > QH.QuarterStart and S.Stop < QH.QuarterStop then DateDiff( minute, S.Start, S.Stop )
    else 0 end as Overlap
  from QuarterHours as QH left outer join
    @Samples as S on S.Start <= QH.QuarterStop and S.Stop >= QH.QuarterStart )
  select QuarterStart, sum( Overlap ) as [ActivityTime]
    from Overlaps
    group by QuarterStart
    order by QuarterStart;

You can change the last select to either select * from QuarterHours or select * from Overlaps to see some of the intermediate values.

Explanatory notes:

You can use any range (@Min/@Max) you want, I just took them from the sample data so that the example would run. I used a table variable for the same reason, no need to create a "real" table for the sake of an example.

The Common Table Expression (CTE) creates, via recursion, a table of QuarterHours that covers the desired range. (A numbers table or tally table could also be used to generate the quarter hours.) Then a LEFT OUTER JOIN with the sample data is used to locate all of the Overlaps, if any, with each quarter hour. That preserves the quarter hours for which there is no activity.

The final SELECT summarizes the results.

HABO
  • 15,314
  • 5
  • 39
  • 57
  • This is a lot more complicated that I had thought or hoped it would be. It may take me some time to wrap my head around what everything means. – William Apr 09 '15 at 14:52
  • I didn't mean to send my last comment yet. Here is the whole thing: This is a lot more complicated that I had thought or hoped it would be. It may take me some time to wrap my head around what everything means. Where you inserted sample data, is it possible to insert my start and stop time variables instead of explicit values? I apologize if these are simple questions but as I said, I'm fairly new to SQL and this will be the most complicated query I've used. Thank you very much for the help! – William Apr 09 '15 at 15:01
  • @William - I added some additional notes to explain what is going on. Again, if you change the final `select` to show the `QuarterHours` and `Overlaps` tables it should be much clearer. – HABO Apr 09 '15 at 15:17
  • I wanted to write a short followup. The query that you wrote works perfectly for what I need. Thank again for the help. – William Apr 10 '15 at 12:40
1

The following query will give you each 15-minute increment that contains at least one start time and the total amount (in minutes) of activity for the entire duration that started in that 15-minute increment.

select  Date,
        Convert( SmallDatetime, Floor( Cast( StartDateTime as float ) * 96.0 ) / 96.0 ) Increment,
        Sum( DateDiff( second, StartDateTime, StopDateTime )) / 60 Duration
from    Activities
group by Date, Convert( SmallDatetime, Floor( Cast( StartDateTime as float ) * 96.0 ) / 96.0 );

Which returns this:

Date       Increment           Duration
---------- ------------------- --------
2015-02-02 2015-02-02 07:00:00 25
2015-02-02 2015-02-02 07:15:00  9
2015-02-02 2015-02-02 07:30:00  7
2015-02-02 2015-02-02 08:00:00  9
2015-02-02 2015-02-02 08:15:00 15
2015-02-02 2015-02-02 08:45:00 30

I was just looking into calculating the running total with overflow into the next increment, when a coupla things occurred to me. One is that you're going to need every 15-minute increment during the time of your query, whether any activity starts in it or not. So we would have to use a tally table to ensure every interval is generated, if nothing else, to catch some overflow minutes from the previous interval.

Then, of course, there is tracking the running total with overflow. While this is possible (see https://stackoverflow.com/a/861073/3658753 for a good explanation of how), it hit me that the combination of the two (tally table and running total) is an awful lot of overhead to be performed in SQL. Remember that performing calculations in SQL is many times faster than even the fastest disk access, but performing calculations in any high level language (Java, C++, C# or even scripting languages like Perl) is going to be many times faster than SQL. Plus the maintainability of the SQL solution will be deep in the basement.

So my recommendation at this point is to take the query above and feed it into a good reporting engine or your application and have them perform the additional calculations. Performance-wise, you'll be way ahead.

Community
  • 1
  • 1
TommCatt
  • 5,498
  • 1
  • 13
  • 20