I have table storing events occurring to users as shown in http://sqlfiddle.com/#!15/2b559/2/0
event_id(integer)
user_id(integer)
event_type(integer)
timestamp(timestamp)
A sample of the data looks as follows:
+-----------+----------+-------------+----------------------------+
| event_id | user_id | event_type | timestamp |
+-----------+----------+-------------+----------------------------+
| 1 | 1 | 1 | January, 01 2015 00:00:00 |
| 2 | 1 | 1 | January, 10 2015 00:00:00 |
| 3 | 1 | 1 | January, 20 2015 00:00:00 |
| 4 | 1 | 1 | January, 30 2015 00:00:00 |
| 5 | 1 | 1 | February, 10 2015 00:00:00 |
| 6 | 1 | 1 | February, 21 2015 00:00:00 |
| 7 | 1 | 1 | February, 22 2015 00:00:00 |
+-----------+----------+-------------+----------------------------+
I would like to get, for each event, the number of events of the same user and the same event_type that occurred within 30 days before the event.
It should look like the following:
+-----------+----------+-------------+-----------------------------+-------+
| event_id | user_id | event_type | timestamp | count |
+-----------+----------+-------------+-----------------------------+-------+
| 1 | 1 | 1 | January, 01 2015 00:00:00 | 1 |
| 2 | 1 | 1 | January, 10 2015 00:00:00 | 2 |
| 3 | 1 | 1 | January, 20 2015 00:00:00 | 3 |
| 4 | 1 | 1 | January, 30 2015 00:00:00 | 4 |
| 5 | 1 | 1 | February, 10 2015 00:00:00 | 3 |
| 6 | 1 | 1 | February, 21 2015 00:00:00 | 3 |
| 7 | 1 | 1 | February, 22 2015 00:00:00 | 4 |
+-----------+----------+-------------+-----------------------------+-------+
The table contains millions of rows so I cannot go with a correlated subquery as suggested by @jpw in the answers below.
So far I managed to get the total number of events that occurred before with the same user_id and same event_id by using the following query:
SELECT event_id, user_id,event_type,"timestamp",
COUNT(event_type) OVER w
FROM events
WINDOW w AS (PARTITION BY user_id,event_type ORDER BY timestamp
ROWS UNBOUNDED PRECEDING);
With the following result:
+-----------+----------+-------------+-----------------------------+-------+
| event_id | user_id | event_type | timestamp | count |
+-----------+----------+-------------+-----------------------------+-------+
| 1 | 1 | 1 | January, 01 2015 00:00:00 | 1 |
| 2 | 1 | 1 | January, 10 2015 00:00:00 | 2 |
| 3 | 1 | 1 | January, 20 2015 00:00:00 | 3 |
| 4 | 1 | 1 | January, 30 2015 00:00:00 | 4 |
| 5 | 1 | 1 | February, 10 2015 00:00:00 | 5 |
| 6 | 1 | 1 | February, 21 2015 00:00:00 | 6 |
| 7 | 1 | 1 | February, 22 2015 00:00:00 | 7 |
+-----------+----------+-------------+-----------------------------+-------+
Do you know if there a way to change the window frame specification or the COUNT function so only the number of events which occurred within x days is returned?
In a second time, I would like to exclude duplicate events, i.e. same event_type and same timestamp.