1

I've got some sensory information going into a table. I have figured out the query that will tell me exactly when the value at a particular device changes.

What I need to know is the status of all of the other sensors at that time. The trick is, the timestamps won't be equal. I could get a data point from sensor 1, then 3 minute later, one from sensor 2, and then 30 seconds later, another from sensor 1.

So, here is an example of what I am talking about:

--- data_table ---

sensor | state | stime
-------+-------+---------------------
     1 |     A | 2014-08-17 21:42:00
     1 |     A | 2014-08-17 21:43:00
     2 |     B | 2014-08-17 21:44:00
     3 |     C | 2014-08-17 21:45:00
     2 |     D | 2014-08-17 21:46:00
     3 |     C | 2014-08-17 21:47:00
     1 |     B | 2014-08-17 21:48:00
     3 |     A | 2014-08-17 21:49:00
     2 |     D | 2014-08-17 21:50:00
     2 |     A | 2014-08-17 21:51:00

Now, I know the query that will deliver me the state changes. I've got this down, and it's in a view. That table would look like:

 --- state_changed_view ---

sensor | state | stime
-------+-------+---------------------
     2 |     D | 2014-08-17 21:46:00
     1 |     B | 2014-08-17 21:48:00
     3 |     A | 2014-08-17 21:49:00
     2 |     A | 2014-08-17 21:51:00 

What I want is a JOIN, where I can get all of the values of the 'state_changed_view', but also the values of the other corresponding sensors at the 'sensor_timestamp' within the view.

So, ideally, I want my result to look like (or something similar to):

sensor | state | stime               | sensor | state | stime
-------+-------+---------------------+--------+-------+---------------------
     2 |     D | 2014-08-17 21:46:00 |      1 |     A | 2014-08-17 21:43:00
     2 |     D | 2014-08-17 21:46:00 |      2 |     D | 2014-08-17 21:46:00
     2 |     D | 2014-08-17 21:46:00 |      3 |     C | 2014-08-17 21:45:00
     1 |     B | 2014-08-17 21:48:00 |      1 |     B | 2014-08-17 21:48:00
     1 |     B | 2014-08-17 21:48:00 |      2 |     D | 2014-08-17 21:46:00
     1 |     B | 2014-08-17 21:48:00 |      3 |     C | 2014-08-17 21:47:00
     3 |     A | 2014-08-17 21:49:00 |      1 |     B | 2014-08-17 21:48:00
     3 |     A | 2014-08-17 21:49:00 |      2 |     D | 2014-08-17 21:46:00 
     3 |     A | 2014-08-17 21:49:00 |      3 |     A | 2014-08-17 21:49:00 
     2 |     A | 2014-08-17 21:51:00 |      1 |     B | 2014-08-17 21:48:00 
     2 |     A | 2014-08-17 21:51:00 |      2 |     A | 2014-08-17 21:51:00 
     2 |     A | 2014-08-17 21:51:00 |      3 |     A | 2014-08-17 21:49:00

As you can see, I need the most recent row in 'data_table' for each sensor, for every row that exists in state_changed_view.

I just don't know how to get the SQL to get me the most recent row according to a particular timestamp.

This is on a PL/pgSQL system, so anything compatible with Postgres is handy.

user272735
  • 10,473
  • 9
  • 65
  • 96
jasonmclose
  • 1,667
  • 4
  • 22
  • 38

3 Answers3

3

Query

For a small, given set of sensors to retrieve (this works for Postgres 8.4 or later):

SELECT c.sensor AS sensor_change
     , d1.state AS state_1, d1.stime AS stime_1
     , d2.state AS state_2, d2.stime AS stime_2
     , d3.state AS state_3, d3.stime AS stime_3
FROM  (
   SELECT sensor, stime
        , lag(state) OVER (PARTITION BY sensor ORDER BY stime)
           <> state AS change
        , max(CASE WHEN sensor = 1 THEN stime ELSE NULL END) OVER w AS last_1
        , max(CASE WHEN sensor = 2 THEN stime ELSE NULL END) OVER w AS last_2
        , max(CASE WHEN sensor = 3 THEN stime ELSE NULL END) OVER w AS last_3
   FROM   data d
   WINDOW w AS (ORDER BY stime)
   ) c
JOIN   data d1 ON d1.sensor = 1 AND d1.stime = c.last_1
JOIN   data d2 ON d2.sensor = 2 AND d2.stime = c.last_2
JOIN   data d3 ON d3.sensor = 3 AND d3.stime = c.last_3
WHERE  c.change
ORDER  BY c.stime;

Not using the view at all, building on the table directly, that's faster.

This is assuming a UNIQUE INDEX on (sensor, stime) to be unambiguous. Performance also heavily depends on such an index.

As opposed to @Nick's solution, building on JOIN LATERAL (Postgres 9.3+), this returns a single row with all values for every change in state.

PL/pgSQL function

Since you mentioned PL/pgSQL, I would expect this (highly optimized) plpgsql function to perform better, since it can make do with a single sequential scan of the table:

CREATE OR REPLACE FUNCTION f_sensor_change()
  RETURNS TABLE (sensor_change int   -- adapt to actual data types!
               , state_1 "char", stime_1 timestamp
               , state_2 "char", stime_2 timestamp
               , state_3 "char", stime_3 timestamp) AS
$func$
DECLARE
   r    data%rowtype;
BEGIN

FOR r IN
   TABLE data ORDER BY stime
LOOP
   CASE r.sensor
   WHEN 1 THEN  
      IF    r.state =  state_1 THEN  -- just save stime
         stime_1 := r.stime;
      ELSIF r.state <> state_1 THEN  -- save all & RETURN
         stime_1 := r.stime; state_1 := r.state;
         sensor_change := 1; RETURN NEXT;
      ELSE                           -- still NULL: init
         stime_1 := r.stime; state_1 := r.state;
      END IF;

   WHEN 2 THEN
      IF    r.state =  state_2 THEN
         stime_2 := r.stime;
      ELSIF r.state <> state_2 THEN
         stime_2 := r.stime; state_2 := r.state;
         sensor_change := 2; RETURN NEXT;
      ELSE
         stime_2 := r.stime; state_2 := r.state;
      END IF;

   WHEN 3 THEN
      IF    r.state =  state_3 THEN
         stime_3 := r.stime;
      ELSIF r.state <> state_3 THEN
         stime_3 := r.stime; state_3 := r.state;
         sensor_change := 3; RETURN NEXT;
      ELSE
         stime_3 := r.stime; state_3 := r.state;
      END IF;
   ELSE             -- do nothing, ignore other sensors
   END CASE;
END LOOP;

END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_sensor_change();

Makes sense for repeated use. Related answer:

SQL Fiddle for Postgres 9.3.
SQL Fiddle for Postgres 8.4.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I have to tell you that this was absolutely impressive. It works for my small number of sensors. I know I need to upgrade Postgres (nearly every time I post on here, I get that remark), but that upgrade will be huge. Fortunately, it is coming soon. Thanks so much for your work. This was fantastic. – jasonmclose Aug 19 '14 at 19:34
  • @jasonmclose: Note the further simplification to the query. Plus, I added a plpgsql variant. I would be very interested which performs better in your case. Would you mind running `EXPLAIN ANALYZE` and leaving a comment here with the results (best of 5 or something to rule out caching artifacts). – Erwin Brandstetter Aug 19 '14 at 20:43
  • Thanks for this. I will try to get this in some time today or tomorrow. I am in a time crunch at the moment. – jasonmclose Aug 20 '14 at 13:58
1

There are a couple of things making this not-so-straightforward:

  • You want to do a subquery for each state_changed_view row, but the subquery must mention the corresponding stime from the view (to restrict it to earlier records). Ordinary subqueries aren't allowed to depend on external fields, but you can accomplish this (as of Postgres 9.3, at least) with a lateral join.
  • You need not only MAX(data_table.stime), but the corresponding data_table.state. You could do this with another nested query to retrieve the rest of the row, but SELECT DISTINCT ON gives you an easy way to fetch the whole thing at once.

The end result is something like this:

SELECT *
FROM
  state_changed_view,
  LATERAL (
    SELECT DISTINCT ON (sensor)
      sensor,
      state,
      stime
    FROM
      data_table
    WHERE
      data_table.stime <= state_changed_view.stime
    ORDER BY
      sensor,
      stime DESC
  ) a
Nick Barnes
  • 19,816
  • 3
  • 51
  • 63
  • Thank you for the response. Unfortunately, I must be running on 8.4, and cannot upgrade for business reasons. This always seems to be an issue, where the answer to my question is always easily gained by a release that is yet to be supported for my system. – jasonmclose Aug 19 '14 at 18:50
  • @jasonmclose: I've got a solution in 8.4 for you, but you really should consider upgrading. [8.4 has reached reaching EOL in July](http://www.postgresql.org/support/versioning/). I like this elegant solution, btw. – Erwin Brandstetter Aug 19 '14 at 18:58
0

Start by finding the max time for each sensor and state with a subquery which groups on sensor and state and then join that to the view

SELECT *
FROM 
(SELECT sensor, state, MAX(stime) as stime
from data_table
group by sensor, state) a
join state_changed_view on 1=1
Hedinn
  • 864
  • 4
  • 7
  • When I try this on my real data, it returns a lot more data than the mapping I would like. It is close though. Definitely more than I've got so far. – jasonmclose Aug 19 '14 at 16:01