4

I need help with the query, let's say that this is the data in table.

timestamp           
------------------- 
2010-11-16 10:30:00
2010-11-16 10:37:00
2010-11-16 10:40:00 
2010-11-16 10:45:00
2010-11-16 10:48:00
2010-11-16 10:55:00
2010-11-16 10:56:00

I want to get every first row (timestamp) that is at least 5 minutes later than the last. In this case the query should return:

timestamp           
------------------- 
2010-11-16 10:30:00
2010-11-16 10:37:00
2010-11-16 10:45:00
2010-11-16 10:55:00
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user20902
  • 157
  • 1
  • 10
  • Is this Postgres or MySql? – Mike Christensen Dec 16 '13 at 22:49
  • 1
    I need it for postgres, but any suggestion is acceptable because i am completely out of ideas. – user20902 Dec 16 '13 at 22:50
  • 1
    5 minutes greater than the previous row in the source table, or 5 minutes greater than the last row in the result table? e.g. if you put an extra row on the end of the data at 11:00:00 do you expect it to be in the resultset? – abasterfield Dec 16 '13 at 23:05
  • 5 minutes greater than the last row in the result table, which means that 11:00:00 should be in the query results. – user20902 Dec 16 '13 at 23:07

1 Answers1

5

Recursive CTE

Since each row depends on the one before, it is hard to solve with a set-based approach. Resorting to a recursive CTE (which is standard SQL):

WITH RECURSIVE cte AS (
   (
   SELECT ts FROM tbl
   ORDER BY ts
   LIMIT 1
   )

   UNION ALL
   (
   SELECT t.ts
   FROM   cte c
   JOIN   tbl t ON t.ts >= c.ts + interval '5 min'
   ORDER  BY t.ts
   LIMIT 1
   )
   )
TABLE cte ORDER BY ts;

Aggregate functions are not allowed in a recursive CTE. I substituted with ORDER BY / LIMIT 1, which is fast when supported by an index on ts.

The parentheses around each leg of the UNION query are necessary to allow LIMIT, which would otherwise only be permitted once at the end of a UNION query.

PL/pgSQL function

A procedural solution (example with a plpgsql function) iterating through the sorted table would probably be a lot faster, since it can make do with a single table scan:

CREATE OR REPLACE FUNCTION f_rowgrid(i interval)
  RETURNS SETOF timestamp
  LANGUAGE plpgsql AS
$func$
DECLARE
   _this  timestamp;
   _last  timestamp := '-infinity';     -- init so that 1 row passes
BEGIN
   FOR _this IN
       SELECT ts FROM tbl ORDER BY 1
   LOOP
       IF _this >= _last + i THEN
          RETURN NEXT _this;
          _last := _this;
       END IF;
   END LOOP;
END
$func$;

Call:

SELECT * FROM  f_rowgrid('5 min');

db<>fiddle here - demonstrating both
Old sqlfiddle

Here is a more complex example for this type of plpgsql functions:

Could easily be made generic with dynamic SQL and EXECUTE to work for arbitrary tables.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for your answer, the recursive query is doing the work...I just now need to find the way to execute this query in PHP. – user20902 Dec 16 '13 at 23:58