I found solutions that find the next event date, but not ones that would include all the data from the event. With cheating, I can get it done, but that only works in mysql and fails in vertica.
Here is the problem I am trying to solve:
I want to show all events a with data from the first event X that follows a and is not of type a. So here is the cut and paste example so you can play with it to see what actually works:
CREATE TABLE events (user_id int ,created_at int, event varchar(20));
INSERT INTO events values (0,0, 'a');
INSERT INTO events values (0,1, 'b');
INSERT INTO events values (0,2, 'c');
INSERT INTO events values (0,3, 'a');
INSERT INTO events values (0,4, 'c');
INSERT INTO events values (0,5, 'b');
INSERT INTO events values (0,6, 'a');
INSERT INTO events values (0,7, 'a');
INSERT INTO events values (0,8, 'd');
SELECT * FROM events;
+---------+------------+-------+
| user_id | created_at | event |
+---------+------------+-------+
| 0 | 0 | a |
| 0 | 1 | b |
| 0 | 2 | c |
| 0 | 3 | a |
| 0 | 4 | c |
| 0 | 5 | b |
| 0 | 6 | a |
| 0 | 7 | a |
| 0 | 8 | d |
+---------+------------+-------+
9 rows in set (0.00 sec)
Here is the result I know how to get in both, but I cannot seem to be able to get the event info in it as well:
SELECT user_id, MAX(purchased) AS purchased, spent
FROM (
SELECT
e1.user_id AS user_id, e1.created_at AS purchased,
MIN(e2.created_at) AS spent
FROM events e1, events e2
WHERE
e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
e1.Event = 'a' AND e2.Event != 'a'
GROUP BY e1.user_id, e1.created_at
) e3 GROUP BY user_id, spent;
user_id | purchased | spent
---------+-----------+-------
0 | 0 | 1
0 | 3 | 4
0 | 7 | 8
Now if I want the event type in there as well, it does not work with the query above, because you either have to use the event field in the group-by (not what we want) or with an aggregate (not what we want either). Funny enough in mysql it works, but I consider that cheating and since I have to use vertica for this, it won't help me:
SELECT user_id, MAX(purchased) as purchased, spent, event FROM (
SELECT
e1.User_ID AS user_id,
e1.created_at AS purchased,
MIN(e2.created_at) AS spent,
e2.event AS event
FROM events e1, events e2
WHERE
e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
e1.Event = 'a' AND e2.Event != 'a'
GROUP BY
e1.user_id,e1.created_at
) e3 GROUP BY user_id, spent;
+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
| 0 | 0 | 1 | b |
| 0 | 3 | 4 | c |
| 0 | 7 | 8 | d |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)
For vertica the same query throws an error: ERROR 2640: Column "e2.event" must appear in the GROUP BY clause or be used in an aggregate function
Whats an elegant solution to get the two events paired up with all their columns and without cheating so that I can get the same result as shown above when executing in vertica or some other database that does not allow the cheat? In the sample data I have exactly one additional column I want, which is the event type, but in the real world it would be two or three columns.
Please try it out with the sample data posted before answering :)