1

I am trying to implement a function that calculates the spare time out of stored activities start and end times. I implemented my database on PostgreSQL 9.5.3. This is how the activity table looks like

activity_id | user_id   | activity_title                     | starts_at                     | ends_at 

(serial)    | (integer) | (text)                             | (timestamp without time zone) |(timestamp without time zone)
---------------------------------------------------------------------------------------------------------------------------
1           | 1         | Go to school                       | 2016-06-12 08:00:00           | 2016-06-12 14:00:00
2           | 1         | Visit my uncle                     | 2016-06-12 16:00:00           | 2016-06-12 17:30:00
3           | 1         | Go shopping                        | 2016-06-12 18:00:00           | 2016-06-12 21:15:00
4           | 1         | Go to Library                      | 2016-06-13 10:00:00           | 2016-06-13 12:00:00
5           | 1         | Install some programs on my laptop | 2016-06-13 18:00:00           | 2016-06-13 19:00:00

Actual table definition of my real table:

CREATE TABLE public.activity (
  activity_id serial,
  user_id integer NOT NULL,
  activity_title text,
  starts_at timestamp without time zone NOT NULL,
  start_tz text NOT NULL,
  ends_at timestamp without time zone NOT NULL,
  end_tz text NOT NULL,
  recurrence text NOT NULL DEFAULT 'none'::text,
  lat numeric NOT NULL,
  lon numeric NOT NULL,
  CONSTRAINT pk_activity PRIMARY KEY (activity_id),
  CONSTRAINT fk_user_id FOREIGN KEY (user_id)
      REFERENCES public.users (user_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

I want to calculate every day spare time for this user using PL/pgSQL function that takes (user_id INTEGER, range_start TIMESTAMP, range_end TIMESTAMP) as parameters. I want the output of this SQL statement:

SELECT * from calculate_spare_time(1, '2016-06-12', '2016-06-13');

to be like this:

spare_time_id | user_id   | starts_at                     | ends_at 

(serial)      | (integer) | (timestamp without time zone) |(timestamp without time zone)
----------------------------------------------------------------------------------------
1             | 1         | 2016-06-12 00:00:00           | 2016-06-12 08:00:00
2             | 1         | 2016-06-12 12:00:00           | 2016-06-12 16:00:00
3             | 1         | 2016-06-12 17:30:00           | 2016-06-12 18:00:00
4             | 1         | 2016-06-12 21:15:00           | 2016-06-13 00:00:00
5             | 1         | 2016-06-13 00:00:00           | 2016-06-13 10:00:00
6             | 1         | 2016-06-13 12:00:00           | 2016-06-13 18:00:00
7             | 1         | 2016-06-13 19:00:00           | 2016-06-14 00:00:00

I have the idea of subtracting the end time of one activity from the start time of the next activity happening on the same date, but I am stuck with implementing that with PL/pgSQL especially on how to deal with 2 rows in the same time.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Hope this edit makes my question clear. – Safaa Mamdouh Salem Jun 13 '16 at 08:43
  • Much better now. Are there any constraints on the data? Can time ranges of activities overlap? Can they be adjacent? Can columns be NULL? Are `starts_at` and `ends_at` *inclusive* or *exclusive* bounds? (Lower bound inclusive, upper bound exclusive, would be the canonical default.) Is there a 15-minute grid on allowed timestamps? The actual complete table definition (`CREATE TABLE` script with all constraints) would clarify all that. The function very much depends on what data must be expected. – Erwin Brandstetter Jun 13 '16 at 14:36
  • I added the `CREATE TABLE` script to the post. Time ranges of activities can overlap only if activities take place in the same location which is represented by `lat` and `lon` columns. `starts_at` is an inclusive bound while `ends_at` is an exclusive bound. – Safaa Mamdouh Salem Jun 13 '16 at 15:39
  • I don't understand what 15-minute grid on allowed timestamps is. I don't get the point of it. – Safaa Mamdouh Salem Jun 13 '16 at 16:01
  • "15-minute grid" would mean that time values are entered in 15 minute intervals, which would allow a discrete solution. I added a general solution. – Erwin Brandstetter Jun 13 '16 at 16:34

1 Answers1

1

To simplify things, I suggest to create a view - or better yet: a MATERIALZED VIEW showing gaps in the activities per user:

CREATE MATERIALIZED VIEW mv_gap AS
SELECT user_id, tsrange(a, z) AS gap
FROM  (
   SELECT user_id, ends_at AS a
        , lead(starts_at) OVER (PARTITION BY user_id ORDER BY starts_at) AS z
   FROM   activity
   ) sub
WHERE  z > a;  -- weed out simple overlaps and the dangling "gap" till infinity

Note the range type tsrange.

ATTENTION: You mentioned possible overlaps, which complicate things. If one time range of a single user can be included in another, you need to do more! Merge time ranges to identify earliest start and latest end per block.

Remember to refresh the MV when needed.

Then your function can simply be:

CREATE OR REPLACE FUNCTION f_freetime(_user_id int, _from timestamp, _to timestamp)
  RETURNS TABLE (rn int, gap tsrange) AS
$func$
   SELECT row_number() OVER (ORDER BY g.gap)::int AS rn
        , g.gap * tsrange(_from, _to) AS gap
   FROM   mv_gap g
   WHERE  g.user_id = _user_id
   AND    g.gap && tsrange(_from, _to)
   ORDER  BY g.gap;
$func$  LANGUAGE sql STABLE;

Call:

SELECT * FROM f_freetime(1, '2016-06-12 0:0', '2016-06-13 0:0');

Note the range operators * and &&.
Also note that I use a simple SQL function, after the problem has been simplified enough. If you need to add more, you might want to switch back to plpgsql and use RETURN QUERY ...

Or just use the query without function wrapper.

Performance

If you have many rows per user, to optimize query times, add an SP-GiST index (one reason to use a MV):

CREATE INDEX activity_gap_spgist_idx on mv_gap USING spgist (gap);

In addition to an index on (user_id).
Details in this related answer:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228