1

I'm using postgreSQL 8.0 and I have a table with user_id, timestamp, and event_id.

How can I return the rows (or row) after the 4th occurrence of event_id = someID per user?

|---------------------|--------------------|------------------|
|      user_id        |     timestamp      |     event_id     |
|---------------------|--------------------|------------------|
|          1          |  2020-04-02 12:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 13:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 14:00  |        99        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 15:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 16:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 17:00  |        11        |
|---------------------|--------------------|------------------|
|          2          |  2020-04-02 17:00  |        11        |
|---------------------|--------------------|------------------|

Ie if event_id = 11, I would only want the last row in the table above.

Indy
  • 75
  • 6
  • Postgres 8.0 has been unmaintained for nearly 10 years. Why are you using such an old version? Or are you maybe using some fork based on that archaic version? What does `select version();` give you? –  Apr 03 '20 at 05:25

3 Answers3

0

You can use window functions:

select *
from (
    select t.*, row_number() over(partition by user_id, event_id order by timestamp) rn
    from mytable t
) t
where rn > 4

Here is a little trick that removes the row number from the result:

select (t).*
from (
    select t, row_number() over(partition by user_id, event_id order by timestamp) rn
    from mytable t
) x
where rn > 4
GMB
  • 216,147
  • 25
  • 84
  • 135
  • I'm not sure what the problem is, but this doesn't seem to be returning the correct results. It seems to miss a lot of rows – Indy Apr 02 '20 at 22:55
  • @Indy: well as far as I see this query does work for your sample data (ie it only returns the "last" row). Is there something different in your real data? – GMB Apr 03 '20 at 00:05
  • I could dig into the particulars of the data, but I need all rows after 4 occurrences of a *particular* event_id. The code above does not even take that as input (ie all rows after 4 occurrences of event_id=11) – Indy Apr 03 '20 at 00:38
0

You can use a cumulative count. This version includes the 4th occurrence:

select t.*
from (select t.*,
             count(*) filter (where event_id = 11) over (partition by user_id order by timestamp) as event_11_cnt
      from t
     ) t
where event_11_cnt >= 4;

The filter has been valid Postgres syntax for a long time, but instead, you can use:

select t.*
from (select t.*,
             sum( (event_id = 11)::int ) over (partition by user_id order by timestamp) as event_11_cnt
      from t
     ) t
where event_11_cnt >= 4;

This version does not:

where event_11_cnt > 4 or (event_11_cnt = 4 and event_id <> 11)

An alternative method:

select t.*
from t
where t.timestamp > (select t2.timestamp
                     from t t2
                     where t2.user_id = t.user_id and
                           t2.event_id = 11
                     order by t2.timestamp
                     limit 1 offset 3
                    );
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The "filter (where" line is causing a syntax error for me (I'm not really familiar with this syntax myself either). The alternative method throws a "ERROR: This type of correlated subquery pattern is not supported yet". Maybe you are using a newer version of postgres? – Indy Apr 02 '20 at 23:03
  • 1
    @Indy . . . You must be using an archaic version of Postgres. `filter` has been supported for a while. – Gordon Linoff Apr 02 '20 at 23:26
0

sorry to be asking about such an old version of Postgres, here is an answer that worked:

WITH EventOrdered AS(
  SELECT 
    EventTypeId
    , UserId
    , Timestamp
    , ROW_NUMBER() OVER (PARTITION BY EventTypeId, UserId ORDER BY Timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) ROW_NO
  FROM Event),
FourthEvent AS (
  SELECT DISTINCT
    UserID
  , FIRST_VALUE(TimeStamp) OVER (PARTITION BY UserId ORDER BY Timestamp) FirstFourthEventTimestamp
  FROM EventOrdered
  WHERE ROW_NO = 4)
SELECT e.*
FROM Event e
JOIN FourthEvent ffe
  ON e.UserId = ffe.UserId
  AND e.Timestamp > ffe.FirstFourthEventTimestamp
ORDER BY e.UserId, e.Timestamp
Indy
  • 75
  • 6
  • That will definitely not work on Postgres 8.0 that version neither had CTE nor window functions. What exactly does `select version();` show you? –  Apr 08 '20 at 11:03