2

I'm using postgres to run some analytics on user activity. I have a table of all requests(pageviews) made by every user and the timestamp of the request, and I'm trying to find the number of distinct sessions for every user. For the sake of simplicity, I'm considering every set of requests an hour or more apart from others as a distinct session. The data looks something like this:

id|          request_time|         user_id
1    2014-01-12 08:57:16.725533    1233
2    2014-01-12 08:57:20.944193    1234
3    2014-01-12 09:15:59.713456    1233
4    2014-01-12 10:58:59.713456    1234

How can I write a query to get the number of sessions per user?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Jonathan Knight
  • 135
  • 1
  • 5

1 Answers1

4

To start a new session after every gap >= 1 hour:

SELECT user_id, count(*) AS distinct_sessions
FROM (
   SELECT user_id
        ,(lag(request_time, 1, '-infinity') OVER (PARTITION BY user_id
                                                  ORDER BY request_time)
           <= request_time - '1h'::interval) AS step -- start new session
   FROM   tbl
   ) sub
WHERE  step
GROUP  BY user_id
ORDER  BY user_id;

Assuming request_time NOT NULL.

Explain:

  • In subquery sub, check for every row if a new session begins. Using the third parameter of lag() to provide the default -infinity, which is lower than any timestamp and therefore always starts a new session for the first row.

  • In the outer query count how many times new sessions started. Eliminate step = FALSE and count per user.

Alternative interpretation

If you really wanted to count hours where at least one request happened (I don't think you do, but another answer assumes as much), you would:

SELECT user_id
     , count(DISTINCT date_trunc('hour', request_time)) AS hours_with_req
FROM   tbl
GROUP  BY 1
ORDER  BY 1;
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • how would you go about returning a unique number per island instead of a boolean? (in order to find the mix/max within each island, or return the duration of a session) (working on a [similar problem](https://stackoverflow.com/q/55045788/1717535)) – Fabien Snauwaert Mar 08 '19 at 11:06