0

I have the below events data:

events table

event_id | timestamp         | label
--------------------------------------------------
1        |2021-11-15 10:23:54| so
1        |2021-11-15 10:24:30| software
2        |2021-11-15 10:24:54| software Engineer
3        |2021-11-15 10:23:54| test
3        |2021-11-15 10:24:31| test eng
3        |2021-11-15 10:24:54| test Engineer
2        |2021-11-16 10:26:54| software Engineer and Test

I want to collapse this result set to the below result set.

Result:

event_id | timestamp         | label
----------------------------------------------------------
2        |2021-11-15 10:24:54| software Engineer
3        |2021-11-15 10:24:31| test Engineer
2        |2021-11-16 10:26:54| software Engineer and Test

Basically, I want to remove all the records that are substring of any records in future timestamp and have occurred in less than 3 mins interval.

In the above example: so and software is removed because software Engineer is seen in the later timestamp, which is within 3 minutes interval.

event_id isn't necessarily always same.

Is this something possible using a query?

I know that it is easy to do using a program/script. Still, I would like to know how this can be done using a query.

Bikas Katwal
  • 1,895
  • 1
  • 21
  • 42
  • 1
    The description of the algorithm is inconsistent. According to it, the row with `software Engineer` should also be removed. – klin Nov 16 '21 at 11:53
  • Event with `software Engineer and Test` label comes the next day and not within 3 mins interval. `2021-11-16 10:26:54`. Hence this record doesn't overshadow the `software engineer` – Bikas Katwal Nov 16 '21 at 11:57
  • Are you really using Postgres 8? That would rule out window functions. – Jeremy Nov 16 '21 at 12:10
  • I am using redshift :) Which uses Postgres 8 – Bikas Katwal Nov 16 '21 at 12:18
  • @BikasKatwal - OK, my oversight, sorry. Postgres 8 is archaic. In modern Postgres you could use a window function. – klin Nov 16 '21 at 12:20

0 Answers0