0

I do have a (SQL Server) database table that contains an activity log of a device. The table consists of a DeviceId, a timestamp (DateTime) and a value field. The device writes its state changes to the DB whenever it turns on or off (value 1 resp. 0).

Now I wonder what would be the fastest way to get "blocks of activity" from that table. What do I mean by that? I'd like to get all time periods that are defined by a "1" value and its subsequent "0" value for a given DeviceId, so that I get a list of time ranges like this (for the active blocks, the inactive times would be between a 0 value followed by a 1):

DateTime ActiveStart, DateTime ActiveEnd

I currently ended up by first getting all the entries with EF as a list, then looping over them and comparing each entry to its predecessor in order to check if the device had been turned on and off.

That does work, but I do think that there must be a better and more performant way of doing this. What would be the best way to do it? Either a pure SQL query (from which I could build me a Stored Procedure) or a LINQ to SQL query will do.

Thanks for your thoughts and comments!

Rob
  • 11,492
  • 14
  • 59
  • 94
  • 2
    Please edit your question and provide sample data and desired results. These really help other people understand what you mean by something like "blocks of activity" and what the results should look like. – Gordon Linoff Apr 04 '16 at 14:09
  • 1
    And please state the version of your SQL-Server... Are you sure, that there is always a 1 followed by a 0 per device or might there be error data as well? – Shnugo Apr 04 '16 at 14:29
  • Thanks for bringing the error to the table - yes, there may be errors and/or logically inconsistent values, like multiple consecutive same values (although they shouldn't exist, of course, but I can't change this part). – Rob Apr 04 '16 at 19:44
  • Hey, Robert, I updated my query to handle repeating values better, my query returns rows for repeating values and NULL for dates when corresponding value is missing. I see @Shnugo 's query handles data errors differently. You might want to check both queries and choose one that fits better. – Alexey Apr 05 '16 at 06:47
  • @Alexey My solution doesn't handle erronous data at all actually :-) It's just a *side-by-side* numbering. Robert, if you want more help you really should state the version of your SQL Server. Functions like `LAG()` are bound to higher versions and I do not want to waste my time... – Shnugo Apr 05 '16 at 07:06

3 Answers3

2
--------------------------
------ sample data -------
--------------------------
declare @t table 
(
    DeviceId int,
    Timestamp DateTime,
    Value bit
)

insert into @t values
(1, '2016-01-01', 1),
(1, '2016-01-05', 1),
(1, '2016-01-07', 1),
(1, '2016-01-08', 0),
(1, '2016-01-10', 0),
(1, '2016-01-21', 0),
(1, '2016-01-22', 1),
(1, '2016-01-25', 0),
(2, '2016-01-02', 1),
(2, '2016-01-04', 0),
(2, '2016-01-06', 1),
(2, '2016-01-08', 0),
(2, '2016-01-09', 1),
(2, '2016-01-15', 0),
(2, '2016-01-18', 1)

--------------------------
---------- query ---------
--------------------------

select
    DeviceId,
    gr,
    ActiveStart = max(case when Value = 1 then Timestamp end),
    ActiveEnd = max(case when Value = 0 then Timestamp end)
from
(
    select 
        *,
        gr = Value + row_number() over(partition by DeviceId order by Timestamp)
    from @t
) t
group by DeviceId, gr
-- optional sorting by dates for easier results evaluation:
--order by DeviceId,
--  case when max(case when value = 1 then Timestamp end) is NULL
--  then max(case when value = 0 then Timestamp end)
--  else max(case when value = 1 then Timestamp end) end
Alexey
  • 909
  • 6
  • 11
  • Thanks, Alexey! Your answer works really well and also takes into consideration missing values at the beginning. – Rob Apr 05 '16 at 07:20
  • I like this, upvote from my side! One tiny hint: avoid date literals which are culture specific. Your statement doesnt not work on my (german) machine without setting the language to `ENGLISH`. If you need literal dates, you might want to read this: http://stackoverflow.com/a/34275965/5089204 – Shnugo Apr 05 '16 at 07:42
  • Thanks @Shnugo. I didn't know about that. – Alexey Apr 05 '16 at 08:03
1

You might try it like this:

CREATE TABLE #deviceLog (DeviceID INT, Activity DATETIME,Stat INT);
INSERT INTO #deviceLog VALUES
 (1,{ts'2016-04-04 11:20:00'},1)
,(1,{ts'2016-04-04 11:30:00'},0)
,(1,{ts'2016-04-04 11:33:00'},1)
,(1,{ts'2016-04-04 11:38:00'},0)
,(2,{ts'2016-04-04 12:33:00'},1)
,(2,{ts'2016-04-04 12:40:00'},0)
,(3,{ts'2016-04-04 10:33:00'},1)
,(3,{ts'2016-04-04 11:38:00'},0);

WITH AllOn AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY DeviceID ORDER BY Activity) AS Inx,*
    FROM #deviceLog
    WHERE Stat=1
)
,AllOff AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY DeviceID ORDER BY Activity) AS Inx,*
    FROM #deviceLog
    WHERE Stat=0
)
SELECT AllOn.*,AllOff.Activity AS OffActivity
FROM AllOn
INNER JOIN AllOff ON AllOn.DeviceID=AllOff.DeviceID AND AllOn.Inx=AllOff.Inx;

DROP TABLE #deviceLog;

The result

Inx DeviceID    Activity          Stat  OffActivity
 1     1    2016-04-04 11:20:00.000 1   2016-04-04 11:30:00.000
 2     1    2016-04-04 11:33:00.000 1   2016-04-04 11:38:00.000
 1     2    2016-04-04 12:33:00.000 1   2016-04-04 12:40:00.000
 1     3    2016-04-04 10:33:00.000 1   2016-04-04 11:38:00.000
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Thanks for your answer. It works, but things get mixed up when the first *stat* isn't 1. – Rob Apr 05 '16 at 07:12
  • @Robert, as commented some minutes ago, this solution relys on strict integrity of on/off per device. The records must not be inserted in order but a numbering per device and stat **must** lead to a chain of `10101010`. – Shnugo Apr 05 '16 at 07:15
  • @Robert an error safe approach might be: Select all on-records and find their next off-record (orderd by device and date). This would - at least - sort an on to an off in any case. If they really are pairs is not sure. – Shnugo Apr 05 '16 at 07:20
0

SQL Server 2012+ supports cumulative sums. You can get blocks of activity by counting the number of *0*s cumulatively. A block of activity will have a constant value. Then you can aggregate (and filter) to get the periods of activity:

select deviceid, min(timestamp), max(timestamp)
from (select t.*,
             sum(case when value = 0 then 1 else 0 end) over
                 (partition by deviceid order by timestamp) as grp
      from t
     ) t
where value <> 0
group by deviceid, grp;

In earlier versions of SQL Server, you can do something similar using outer apply (and by other methods).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks, I will look at that tomorrow! It sounds like exactly what I'm after. – Rob Apr 04 '16 at 19:40
  • 1
    @Robert It doesn't seem to produce correct results as far as I can tell. – Alexey Apr 04 '16 at 21:52
  • @Alexey . . . Can you elaborate? – Gordon Linoff Apr 04 '16 at 22:29
  • 1
    @GordonLinoff For example, if there are two rows (1, '2016-04-02', 1) and (1, '2016-04-05', 0), the query produces result (1, '2016-04-02', '2016-04-02') but it should be (1, '2016-04-02', '2016-04-05') - the date in the third column is not correct. – Alexey Apr 05 '16 at 05:46
  • @Alexey . . . Perhaps the OP agrees with you but that is definitely not how I read the question. In any case, that actually isn't a hard adjustment to this method. – Gordon Linoff Apr 05 '16 at 14:54