PostgreSQL count max number of concurrent user sessions per hour

Question

Situation

We have a PostgreSQL 9.1 database containing user sessions with login date/time and logout date/time per row. Table looks like this:

    user_id |        login_ts       |         logout_ts  
------------+--------------+--------------------------------
USER1      |  2021-02-03 09:23:00  |   2021-02-03 11:44:00
USER2      |  2021-02-03 10:49:00  |   2021-02-03 13:30:00
USER3      |  2021-02-03 13:32:00  |   2021-02-03 15:31:00
USER4      |  2021-02-04 13:50:00  |   2021-02-04 14:53:00
USER5      |  2021-02-04 14:44:00  |   2021-02-04 15:21:00
USER6      |  2021-02-04 14:52:00  |   2021-02-04 17:59:00

Goal

Would like to get the max number of concurrent users for each 24 hours of each day in the time range. Like this:

date       | hour  | sessions
-----------+-------+-----------
2021-02-03 | 01:00 | 0
2021-02-03 | 02:00 | 0
2021-02-03 | 03:00 | 0
2021-02-03 | 04:00 | 0
2021-02-03 | 05:00 | 0
2021-02-03 | 06:00 | 0
2021-02-03 | 07:00 | 0
2021-02-03 | 08:00 | 0
2021-02-03 | 09:00 | 1
2021-02-03 | 10:00 | 2
2021-02-03 | 11:00 | 2
2021-02-03 | 12:00 | 1
2021-02-03 | 13:00 | 1
2021-02-03 | 14:00 | 1
2021-02-03 | 15:00 | 0
2021-02-03 | 16:00 | 0
2021-02-03 | 17:00 | 0
2021-02-03 | 18:00 | 0
2021-02-03 | 19:00 | 0
2021-02-03 | 20:00 | 0
2021-02-03 | 21:00 | 0
2021-02-03 | 22:00 | 0
2021-02-03 | 23:00 | 0
2021-02-03 | 24:00 | 0
2021-02-04 | 01:00 | 0
2021-02-04 | 02:00 | 0
2021-02-04 | 03:00 | 0
2021-02-04 | 04:00 | 0
2021-02-04 | 05:00 | 0
2021-02-04 | 06:00 | 0
2021-02-04 | 07:00 | 0
2021-02-04 | 08:00 | 0
2021-02-04 | 09:00 | 0
2021-02-04 | 10:00 | 0
2021-02-04 | 11:00 | 0
2021-02-04 | 12:00 | 0
2021-02-04 | 13:00 | 1
2021-02-04 | 14:00 | 3
2021-02-04 | 15:00 | 1
2021-02-04 | 16:00 | 1
2021-02-04 | 17:00 | 1
2021-02-04 | 18:00 | 0
2021-02-04 | 19:00 | 0
2021-02-04 | 20:00 | 0
2021-02-04 | 21:00 | 0
2021-02-04 | 22:00 | 0
2021-02-04 | 23:00 | 0
2021-02-04 | 24:00 | 0

Considerations

"Concurrent" means at the same point in time. Thus user2 and user3 do not overlap for 13:00, but user4 and user6 do overlap for 14:00 even though they only overlap for 1 minute.
User sessions can span multiple hours and would thus count for each hour they are part of.
Each user can only be online once at one point in time.
If there are no users for a particular hour, this should return 0.

Similar questions

A similar question was answered here: Count max. number of concurrent user sessions per day by Erwin Brandstetter. However, this is per day rather than per hour, and I am apparently too much of a noob at postgreSQL to be able to translate it into hourly so I'm hoping someone can help.

score 1 · Accepted Answer · answered Mar 01 '21 at 13:25

I would decompose this into two problems:

Find the number of overlaps and when they begin and end.
Find the hours.

Note two things:

I am assuming that '2014-04-03 17:59:00' is a typo.
The following goes by the beginning of the hour and puts the date/hour in a single column.

First, calculate the overlaps. For this, unpivot the logins and logout. Put in a counter of +1 for logins and -1 for logouts and do a cumulative sum. This looks like:

with overlap as (
      select v.ts, sum(v.inc) as inc,
             sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
             lead(v.ts) over (order by v.ts) as next_ts
      from sessions s cross join lateral
           (values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
      group by v.ts
     )
select *
from overlap
order by ts;

For the next step, use generate_series() to generate timestamps one hour apart. Look for the maximum value during that period using left join and group by:

with overlap as (
      select v.ts, sum(v.inc) as inc,
             sum(sum(v.inc)) over (order by v.ts) as num_overlaps,
             lead(v.ts) over (order by v.ts) as next_ts
      from sessions s cross join lateral
           (values (login_ts, 1), (logout_ts, -1)) v(ts, inc)
      group by v.ts
     )
select gs.hh, coalesce(max(o.num_overlaps), 0) as num_overlaps
from generate_series('2021-02-03'::date, '2021-02-05'::date, interval '1 hour') gs(hh) left join
     overlap o
     on o.ts < gs.hh + interval '1 hour' and
        o.next_ts > gs.hh
group by gs.hh
order by gs.hh;

Here is a db<>fiddle using your data fixed with the a reasonable logout time for the last record.

Thanks. Is it possible to "pivot" the data so a new column is created for each new date? — sharktacos, Mar 05 '21 at 15:56
@sharktacos . . . You can do that using conditional aggregation. It is much messier if you don't know the dates for the columns. — Gordon Linoff, Mar 05 '21 at 16:28
Sorry I'm not following what you mean by that. Could you possibly edit your above query to make it use "conditional aggregation"? It could be a column for each day of week `TO_CHAR(gs.hh, 'Dy')` which is known. — sharktacos, Mar 07 '21 at 02:15
@sharktacos . . . I would suggest that you ask a new question with appropriate sample data and desired results. — Gordon Linoff, Mar 07 '21 at 03:34

filiprem · Answer 2 · 2021-03-01T07:23:15.417

For any time period you can calculate number of concurrent sesions using OVERLAPS operator in SQL:

CREATE TEMP TABLE sessions (
  user_id text not null,
  login_ts timestamp,
  logout_ts timestamp );

INSERT INTO sessions SELECT 'webuser', d,
  d+((1+random()*300)::text||' seconds')::interval
FROM generate_series(
  '2021-02-28 07:42'::timestamp,
  '2021-03-01 07:42'::timestamp,
  '5 seconds'::interval) AS d;

SELECT s1.user_id, s1.login_ts, s1.logout_ts, 
(select count(*) FROM sessions s2 
 WHERE (s2.login_ts, s2.logout_ts) OVERLAPS (s1.login_ts, s1.logout_ts)) 
 AS parallel_sessions
FROM sessions s1 LIMIT 10;

 user_id |      login_ts       |         logout_ts          | parallel_sessions
---------+---------------------+----------------------------+------------------
 webuser | 2021-02-28 07:42:00 | 2021-02-28 07:42:25.528594 |                6
 webuser | 2021-02-28 07:42:05 | 2021-02-28 07:45:50.513769 |               47
 webuser | 2021-02-28 07:42:10 | 2021-02-28 07:44:18.810066 |               28
 webuser | 2021-02-28 07:42:15 | 2021-02-28 07:45:17.3888   |               40
 webuser | 2021-02-28 07:42:20 | 2021-02-28 07:43:14.325476 |               15
 webuser | 2021-02-28 07:42:25 | 2021-02-28 07:43:44.174841 |               21
 webuser | 2021-02-28 07:42:30 | 2021-02-28 07:43:32.679052 |               18
 webuser | 2021-02-28 07:42:35 | 2021-02-28 07:45:12.554117 |               38
 webuser | 2021-02-28 07:42:40 | 2021-02-28 07:46:37.94311  |               55
 webuser | 2021-02-28 07:42:45 | 2021-02-28 07:43:08.398444 |               13
(10 rows)

This work well on small data sets but for better performance, use PostgreSQL Range Types as below. This works on postgres 9.2 and later.

ALTER TABLE sessions ADD timerange tsrange;
UPDATE sessions SET timerange = tsrange(login_ts,logout_ts);
CREATE INDEX ON sessions USING gist (timerange);

CREATE TEMP TABLE level1 AS
SELECT s1.user_id, s1.login_ts, s1.logout_ts,
(select count(*) FROM sessions s2 
 WHERE s2.timerange && s1.timerange) AS parallel_sessions
FROM sessions s1;

SELECT date_trunc('hour',login_ts) AS hour, count(*),
max(parallel_sessions)
FROM level1
GROUP BY hour;
        hour         | count | max 
---------------------+-------+-----
 2021-02-28 14:00:00 |   720 |  98
 2021-03-01 03:00:00 |   720 |  99
 2021-03-01 06:00:00 |   720 |  94
 2021-02-28 09:00:00 |   720 |  96
 2021-02-28 10:00:00 |   720 |  97
 2021-02-28 18:00:00 |   720 |  94
 2021-02-28 11:00:00 |   720 |  97
 2021-03-01 00:00:00 |   720 |  97
 2021-02-28 19:00:00 |   720 |  99
 2021-02-28 16:00:00 |   720 |  94
 2021-02-28 17:00:00 |   720 |  95
 2021-03-01 02:00:00 |   720 |  99
 2021-02-28 08:00:00 |   720 |  96
 2021-02-28 23:00:00 |   720 |  94
 2021-03-01 07:00:00 |   505 |  92
 2021-03-01 04:00:00 |   720 |  95
 2021-02-28 21:00:00 |   720 |  97
 2021-03-01 01:00:00 |   720 |  93
 2021-02-28 22:00:00 |   720 |  96
 2021-03-01 05:00:00 |   720 |  93
 2021-02-28 20:00:00 |   720 |  95
 2021-02-28 13:00:00 |   720 |  95
 2021-02-28 12:00:00 |   720 |  97
 2021-02-28 15:00:00 |   720 |  98
 2021-02-28 07:00:00 |   216 |  93
(25 rows)

PostgreSQL count max number of concurrent user sessions per hour

2 Answers2