0

I'm running Postgres 9.6. I have a user table:

 id                            | integer                     |           | not null | nextval('user_id_seq'::regclass)
 name                          | character varying           |           | not null | 

And I have a history table:

 actioned_by      | integer                     |           |          | 
 category         | character varying           |           | not null | 
 t_actioned       | timestamp without time zone |           | not null |     

I want a way to annotate each user with an integer field num_days that indicates for how many distinct 24 hour periods they have actions.

I know how to get the count of distinct calendar days on which each user was active:

 SELECT d.actioned_by, COUNT(*) AS cnt FROM 
  (SELECT date_trunc('day', t_actioned) AS day, actioned_by 
   FROM history 
   GROUP BY day, actioned_by) d 
 GROUP BY actioned_by
 ORDER BY cnt DESC;

But is there a way to measure these periods would be measured from the time of their first action, to allow for people who log in at five minutes to midnight and leave 10 minutes later?

I'll just be running this occasionally for analytics purposes, so it doesn't really matter if it's slow.

Andrew_Lvov
  • 4,621
  • 2
  • 25
  • 31
Richard
  • 62,943
  • 126
  • 334
  • 542
  • @a_horse_with_no_name Do not think this is a duplicate. In the other question it is asked how to group by a certain time interval. The question here is slightly different: Counting the intervals independent of their row count. – S-Man Sep 10 '18 at 07:58

1 Answers1

0

db<>fiddle

SELECT 
    *, 
    trunc(                                                           -- E
        extract(epoch FROM (                                         -- C
            t_actioned -                                             -- B
            first_value(t_actioned) OVER (PARTITION BY actioned_by ORDER BY t_actioned)  -- A
         )) / 60 / 60 / 24                                           -- D
    ) + 1 as num_days                                                -- F
FROM history 

A: Get first timestamp per user using window function first_value https://www.postgresql.org/docs/current/static/tutorial-window.html

B: Get difference between current timestamp of the row and the very first of the user. Result is an interval

C: Casting the interval into seconds

D: Convert the seconds into days

E: Round to full days (cut the digits)

F: Add 1 because you do not want the difference of days (2 timestamps in one days yields in days == 0) but the number of days

Test data:

actioned_by  t_actioned           
-----------  -------------------  
1            2018-09-08 23:55:00  
1            2018-09-09 00:10:00  
1            2018-09-10 00:05:00  
1            2018-09-10 00:15:00  
2            2018-09-09 20:15:00  
2            2018-09-10 02:15:00  
2            2018-09-10 08:15:00  
2            2018-09-10 14:15:00  
2            2018-09-10 20:14:00 

Result:

actioned_by  t_actioned           num_days  
-----------  -------------------  --------  
1            2018-09-08 23:55:00  1.0       
1            2018-09-09 00:10:00  1.0       
1            2018-09-10 00:05:00  2.0       
1            2018-09-10 00:15:00  2.0       
2            2018-09-09 20:15:00  1.0       
2            2018-09-10 02:15:00  1.0       
2            2018-09-10 08:15:00  1.0       
2            2018-09-10 14:15:00  1.0       
2            2018-09-10 20:14:00  1.0        
S-Man
  • 22,521
  • 7
  • 40
  • 63