2

I am working with time series data that are results of sensor measurements. I need to identify cases where the data flatlines- indicating a sensor malfunction. I want to select where there are more than 3 unchanging values in a row within the last 24 hours.

I think I might need to loop through, but I haven't worked with loops in sql. I assume I'll need to use a subquery to ORDER BY the DateTime. I've also looked at LEAD and LAG. Furthermore, I need to distinguish by SiteID and VariableID, which I think can be done using PARTITION.

Data looks like:

**SiteID**VariableID**DateTime**Value**
   5    1   2014-01-27 12:15    5.576
   5    1   2014-01-27 12:30    5.487
   5    1   2014-01-27 12:45    5.573
   5    1   2014-01-27 13:00    5.903
   5    87  2014-01-27 12:15    -273.2
   5    87  2014-01-27 12:30    -273.2
   5    87  2014-01-27 12:45    -273.2
   5    87  2014-01-27 13:00    -273.2
   5    88  2014-01-27 12:15    -273.2
   5    88  2014-01-27 12:30    -273.2
   5    88  2014-01-27 12:45    -273.2
   5    88  2014-01-27 13:00    -273.2
   5    89  2014-01-27 12:15    -273.2
   5    89  2014-01-27 12:30    -273.2
   5    89  2014-01-27 12:45    -273.2
   5    89  2014-01-27 13:00    -273.2
   5    2   2014-01-27 12:15    30.61
   5    2   2014-01-27 12:30    38.73
   5    2   2014-01-27 12:45    32.84
   5    2   2014-01-27 13:00    31.62
   5    3   2014-01-27 12:15    -9.53
   5    3   2014-01-27 12:30    -8.61
   5    3   2014-01-27 12:45    -8.76
   5    3   2014-01-27 13:00    -9.32
   5    4   2014-01-27 12:15    0.298
   5    4   2014-01-27 12:30    0.32
   5    4   2014-01-27 12:45    0.317
   5    4   2014-01-27 13:00    0.302

I want to generate something like:

**SiteID**VariableID**StartingDateTime**ValueCount**Value**
    5          87     2014-1-27 12:15       4         -273.4
    5          88     2014-1-27 12:15       4         -273.4
    5          89     2014-1-27 12:15       4         -273.4

1 Answers1

1

SQL Fiddle

Using this schema and data (slightly modified, just to make sure everything works):

CREATE TABLE TimeSeries (
  SiteId INT,
  VariableId INT,
  DateTime DATETIME,
  Value NUMERIC(15,5)
);

INSERT INTO TimeSeries VALUES (    5, 1   , '2014-01-27 12:15' ,   5.576     );
INSERT INTO TimeSeries VALUES (    5, 1   , '2014-01-27 12:30' ,   5.487     );
INSERT INTO TimeSeries VALUES (    5, 1   , '2014-01-27 12:45' ,   5.573     );
INSERT INTO TimeSeries VALUES (    5, 1   , '2014-01-27 13:00' ,   5.903     );
INSERT INTO TimeSeries VALUES (    5, 87  , '2014-01-27 12:15' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 87  , '2014-01-27 12:30' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 87  , '2014-01-27 12:45' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 87  , '2014-01-27 13:00' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 88  , '2014-01-27 12:15' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 88  , '2014-01-27 12:30' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 88  , '2014-01-27 12:45' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 88  , '2014-01-27 13:00' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 89  , '2014-01-27 12:15' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 89  , '2014-01-27 12:30' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 89  , '2014-01-27 12:45' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 89  , '2014-01-27 13:00' ,   -273.2    );
INSERT INTO TimeSeries VALUES (    5, 2   , '2014-01-27 12:15' ,   30.61     );
INSERT INTO TimeSeries VALUES (    5, 2   , '2014-01-27 12:30' ,   38.73     );
INSERT INTO TimeSeries VALUES (    5, 2   , '2014-01-27 12:45' ,   32.84     );
INSERT INTO TimeSeries VALUES (    5, 2   , '2014-01-27 13:00' ,   31.62     );
INSERT INTO TimeSeries VALUES (    5, 3   , '2014-01-27 12:15' ,   -9.53     );
INSERT INTO TimeSeries VALUES (    5, 3   , '2014-01-27 12:30' ,   -8.61     );
INSERT INTO TimeSeries VALUES (    5, 3   , '2014-01-27 12:45' ,   -8.76     );
INSERT INTO TimeSeries VALUES (    5, 3   , '2014-01-27 13:00' ,   -9.32     );
INSERT INTO TimeSeries VALUES (    5, 4   , '2014-01-27 12:15' ,   0.298     );
INSERT INTO TimeSeries VALUES (    5, 4   , '2014-01-27 12:30' ,   0.32      );
INSERT INTO TimeSeries VALUES (    5, 4   , '2014-01-27 12:45' ,   0.317     );
INSERT INTO TimeSeries VALUES (    5, 4   , '2014-01-27 13:00' ,   0.302     );

-- Just to make sure the query works
INSERT INTO TimeSeries VALUES (    5, 89  , '2014-01-27 18:30' ,   10        );
INSERT INTO TimeSeries VALUES (    5, 89  , '2014-01-27 19:00' ,   -273.2    ); -- this is not a contiguous value

Query:

WITH Sequences AS (
  SELECT
    T.*,
    ROW_NUMBER() OVER (PARTITION BY SiteId, VariableId, Value ORDER BY DateTime) AS RNO,
    ROW_NUMBER() OVER (ORDER BY SiteId, VariableId, DateTime) AS RNE
  FROM
    TimeSeries T
)
SELECT
  S.SiteId,
  S.VariableId,
  S.Value,
  MIN(S.DateTime) AS [Start],
  MAX(S.DateTime) AS [End],
  COUNT(*) AS ValueCount
FROM
  Sequences S
GROUP BY
  S.SiteId,
  S.VariableId,
  S.Value,
  S.RNE - S.RNO
HAVING
  COUNT(*) > 1

Results:

| SITEID | VARIABLEID |  VALUE |                          START |                            END | VALUECOUNT |
|--------|------------|--------|--------------------------------|--------------------------------|------------|
|      5 |         87 | -273.2 | January, 27 2014 12:15:00+0000 | January, 27 2014 13:00:00+0000 |          4 |
|      5 |         88 | -273.2 | January, 27 2014 12:15:00+0000 | January, 27 2014 13:00:00+0000 |          4 |
|      5 |         89 | -273.2 | January, 27 2014 12:15:00+0000 | January, 27 2014 13:00:00+0000 |          4 |

You can see that only 4 records are found for VariableId = 89 (since the last 2 records that I added should not be considered).

Based on this SO answer and this blog post.

Community
  • 1
  • 1
rsenna
  • 11,775
  • 1
  • 54
  • 60