I have data that looks like this
Name | Date | Event | Event_ID
_____________________________________________________________
BRADLEY | 2014-12-01 16:15:26.442 | ACCESSED | 268766
BRADLEY | 2014-12-01 16:15:36.794 | ACCESSED | 268766
BRADLEY | 2014-12-01 16:15:50.618 | DENIED | 268766
BRADLEY | 2014-12-01 16:16:04.89 | DENIED | 268766
BRADLEY | 2014-12-01 16:18:01.036 | DENIED | 268766
BRADLEY | 2014-12-01 16:18:31.335 | DENIED | 268766
CHARLES | 2014-12-01 08:33:34.831 | ACCESSED | 445317
CHARLES | 2014-12-01 08:33:44.041 | ACCESSED | 445317
CHARLES | 2014-12-01 14:56:49.872 | ACCESSED | 10333360
CHARLES | 2014-12-01 14:56:57.549 | ACCESSED | 10333360
CHARLES | 2014-12-01 14:56:59.248 | ACCESSED | 10333360
CHARLES | 2014-12-01 14:56:62.221 | ACCESSED | 10333360
CHARLES | 2014-12-01 14:56:63.226 | ACCESSED | 10333360
My requirement is that I need to remove events that are ACCESSED
that are within 15 minutes apart. For example BRADLEY, I would remove the second accessed at timestamp 16:15:36.794
. That part is easy for me as I can just logically join the same table together comparing current row to next row and do logic on the Date
.
Now the issue I'm coming across is CHARLES. His Event_ID
of 10333360
is a bit more complicated than BRADLEY's use case. For CHARLES, I will need to remove all ACCESSED
with Event_ID
of 10333360
except the one with Date
of 14:56:49.872
. That's because I need to remove all dates that are within 15 minutes at the start of a new Event_ID
. The real world issue is that there are too much "duplicates" when a user is ACCESSED
and I'm doing data cleanup to remove all these unnecessary ACCESSED
data.
I thought about using window functions in Postgres but there doesn't seem to be anything that can help me with the logic in comparing the dates (http://www.postgresql.org/docs/9.1/static/functions-window.html)
I do have some ideas on how to tackle this problem using stored procedures and create temp tables so that I can actually use variables in a Java-like way. But of course I want it to be efficient and I'm hoping to learn new techniques on how to tackle a problem like this.