1

Situation

We have a PostgreSQL 9.1 database containing user sessions with login date/time and logout date/time per row. Table looks like this:

    user_id |        login_ts       |         logout_ts  
------------+--------------+--------------------------------
USER1      |  2021-02-03 09:23:00  |   2021-02-03 11:44:00
USER2      |  2021-02-03 10:49:00  |   2021-02-03 13:30:00
USER3      |  2021-02-03 13:32:00  |   2021-02-03 15:31:00
USER4      |  2021-02-04 13:50:00  |   2021-02-04 14:53:00
USER5      |  2021-02-04 14:44:00  |   2021-02-04 15:21:00
USER6      |  2021-02-04 14:52:00  |   2021-02-04 17:59:00

Goal

Would like to get the max number of concurrent users for each 24 hours of each day in the time range. Like this:

date       | hour  | sessions
-----------+-------+-----------
2021-02-03 | 01:00 | 0
2021-02-03 | 02:00 | 0
2021-02-03 | 03:00 | 0
2021-02-03 | 04:00 | 0
2021-02-03 | 05:00 | 0
2021-02-03 | 06:00 | 0
2021-02-03 | 07:00 | 0
2021-02-03 | 08:00 | 0
2021-02-03 | 09:00 | 1
2021-02-03 | 10:00 | 2
2021-02-03 | 11:00 | 2
2021-02-03 | 12:00 | 1
2021-02-03 | 13:00 | 1
2021-02-03 | 14:00 | 1
2021-02-03 | 15:00 | 0
2021-02-03 | 16:00 | 0
2021-02-03 | 17:00 | 0
2021-02-03 | 18:00 | 0
2021-02-03 | 19:00 | 0
2021-02-03 | 20:00 | 0
2021-02-03 | 21:00 | 0
2021-02-03 | 22:00 | 0
2021-02-03 | 23:00 | 0
2021-02-03 | 24:00 | 0
2021-02-04 | 01:00 | 0
2021-02-04 | 02:00 | 0
2021-02-04 | 03:00 | 0
2021-02-04 | 04:00 | 0
2021-02-04 | 05:00 | 0
2021-02-04 | 06:00 | 0
2021-02-04 | 07:00 | 0
2021-02-04 | 08:00 | 0
2021-02-04 | 09:00 | 0
2021-02-04 | 10:00 | 0
2021-02-04 | 11:00 | 0
2021-02-04 | 12:00 | 0
2021-02-04 | 13:00 | 1
2021-02-04 | 14:00 | 3
2021-02-04 | 15:00 | 1
2021-02-04 | 16:00 | 1
2021-02-04 | 17:00 | 1
2021-02-04 | 18:00 | 0
2021-02-04 | 19:00 | 0
2021-02-04 | 20:00 | 0
2021-02-04 | 21:00 | 0
2021-02-04 | 22:00 | 0
2021-02-04 | 23:00 | 0
2021-02-04 | 24:00 | 0

Considerations

  • "Concurrent" means at the same point in time. Thus user2 and user3 do not overlap for 13:00, but user4 and user6 do overlap for 14:00 even though they only overlap for 1 minute.
  • User sessions can span multiple hours and would thus count for each hour they are part of.
  • Each user can only be online once at one point in time.
  • If there are no users for a particular hour, this should return 0.

Similar questions

A similar question was answered here: Count max. number of concurrent user sessions per day by Erwin Brandstetter. However, this is per day rather than per hour, and I am apparently too much of a noob at postgreSQL to be able to translate it into hourly so I'm hoping someone can help.

sharktacos
  • 17
  • 5

2 Answers2

1

I would decompose this into two problems:

  1. Find the number of overlaps and when they begin and end.
  2. Find the hours.

Note two things:

  • I am assuming that '2014-04-03 17:59:00' is a typo.
  • The following goes by the beginning of the hour and puts the date/hour in a single column.

First, calculate the overlaps. For this, unpivot the logins and logout. Put in a counter of +1 for logins and -1 for logouts and do a cumulative sum. This looks like:

with overlap as (
      select v.ts, sum(v.inc) as inc,
             sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
             lead(v.ts) over (order by v.ts) as next_ts
      from sessions s cross join lateral
           (values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
      group by v.ts
     )
select *
from overlap
order by ts;

For the next step, use generate_series() to generate timestamps one hour apart. Look for the maximum value during that period using left join and group by:

with overlap as (
      select v.ts, sum(v.inc) as inc,
             sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
             lead(v.ts) over (order by v.ts) as next_ts
      from sessions s cross join lateral
           (values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
      group by v.ts
     )
select gs.hh, coalesce(max(o.num_overlaps), 0) as num_overlaps
from generate_series('2021-02-03'::date, '2021-02-05'::date, interval '1 hour') gs(hh) left join
     overlap o
     on o.ts < gs.hh + interval '1 hour' and
        o.next_ts > gs.hh
group by gs.hh
order by gs.hh;

Here is a db<>fiddle using your data fixed with the a reasonable logout time for the last record.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks. Is it possible to "pivot" the data so a new column is created for each new date? – sharktacos Mar 05 '21 at 15:56
  • @sharktacos . . . You can do that using conditional aggregation. It is much messier if you don't know the dates for the columns. – Gordon Linoff Mar 05 '21 at 16:28
  • Sorry I'm not following what you mean by that. Could you possibly edit your above query to make it use "conditional aggregation"? It could be a column for each day of week `TO_CHAR(gs.hh, 'Dy')` which is known. – sharktacos Mar 07 '21 at 02:15
  • @sharktacos . . . I would suggest that you ask a new question with appropriate sample data and desired results. – Gordon Linoff Mar 07 '21 at 03:34
0

For any time period you can calculate number of concurrent sesions using OVERLAPS operator in SQL:

CREATE TEMP TABLE sessions (
  user_id text not null,
  login_ts timestamp,
  logout_ts timestamp );

INSERT INTO sessions SELECT 'webuser', d,
  d+((1+random()*300)::text||' seconds')::interval
FROM generate_series(
  '2021-02-28 07:42'::timestamp,
  '2021-03-01 07:42'::timestamp,
  '5 seconds'::interval) AS d;

SELECT s1.user_id, s1.login_ts, s1.logout_ts, 
(select count(*) FROM sessions s2 
 WHERE (s2.login_ts, s2.logout_ts) OVERLAPS (s1.login_ts, s1.logout_ts)) 
 AS parallel_sessions
FROM sessions s1 LIMIT 10;

 user_id |      login_ts       |         logout_ts          | parallel_sessions
---------+---------------------+----------------------------+------------------
 webuser | 2021-02-28 07:42:00 | 2021-02-28 07:42:25.528594 |                6
 webuser | 2021-02-28 07:42:05 | 2021-02-28 07:45:50.513769 |               47
 webuser | 2021-02-28 07:42:10 | 2021-02-28 07:44:18.810066 |               28
 webuser | 2021-02-28 07:42:15 | 2021-02-28 07:45:17.3888   |               40
 webuser | 2021-02-28 07:42:20 | 2021-02-28 07:43:14.325476 |               15
 webuser | 2021-02-28 07:42:25 | 2021-02-28 07:43:44.174841 |               21
 webuser | 2021-02-28 07:42:30 | 2021-02-28 07:43:32.679052 |               18
 webuser | 2021-02-28 07:42:35 | 2021-02-28 07:45:12.554117 |               38
 webuser | 2021-02-28 07:42:40 | 2021-02-28 07:46:37.94311  |               55
 webuser | 2021-02-28 07:42:45 | 2021-02-28 07:43:08.398444 |               13
(10 rows)

This work well on small data sets but for better performance, use PostgreSQL Range Types as below. This works on postgres 9.2 and later.

ALTER TABLE sessions ADD timerange tsrange;
UPDATE sessions SET timerange = tsrange(login_ts,logout_ts);
CREATE INDEX ON sessions USING gist (timerange);

CREATE TEMP TABLE level1 AS
SELECT s1.user_id, s1.login_ts, s1.logout_ts,
(select count(*) FROM sessions s2 
 WHERE s2.timerange && s1.timerange) AS parallel_sessions
FROM sessions s1;

SELECT date_trunc('hour',login_ts) AS hour, count(*),
max(parallel_sessions)
FROM level1
GROUP BY hour;
        hour         | count | max 
---------------------+-------+-----
 2021-02-28 14:00:00 |   720 |  98
 2021-03-01 03:00:00 |   720 |  99
 2021-03-01 06:00:00 |   720 |  94
 2021-02-28 09:00:00 |   720 |  96
 2021-02-28 10:00:00 |   720 |  97
 2021-02-28 18:00:00 |   720 |  94
 2021-02-28 11:00:00 |   720 |  97
 2021-03-01 00:00:00 |   720 |  97
 2021-02-28 19:00:00 |   720 |  99
 2021-02-28 16:00:00 |   720 |  94
 2021-02-28 17:00:00 |   720 |  95
 2021-03-01 02:00:00 |   720 |  99
 2021-02-28 08:00:00 |   720 |  96
 2021-02-28 23:00:00 |   720 |  94
 2021-03-01 07:00:00 |   505 |  92
 2021-03-01 04:00:00 |   720 |  95
 2021-02-28 21:00:00 |   720 |  97
 2021-03-01 01:00:00 |   720 |  93
 2021-02-28 22:00:00 |   720 |  96
 2021-03-01 05:00:00 |   720 |  93
 2021-02-28 20:00:00 |   720 |  95
 2021-02-28 13:00:00 |   720 |  95
 2021-02-28 12:00:00 |   720 |  97
 2021-02-28 15:00:00 |   720 |  98
 2021-02-28 07:00:00 |   216 |  93
(25 rows)
filiprem
  • 6,721
  • 1
  • 29
  • 42