0

I found solutions that find the next event date, but not ones that would include all the data from the event. With cheating, I can get it done, but that only works in mysql and fails in vertica.

Here is the problem I am trying to solve:

I want to show all events a with data from the first event X that follows a and is not of type a. So here is the cut and paste example so you can play with it to see what actually works:

CREATE TABLE events (user_id int ,created_at int, event varchar(20));
INSERT INTO events values (0,0, 'a');
INSERT INTO events values (0,1, 'b');
INSERT INTO events values (0,2, 'c');
INSERT INTO events values (0,3, 'a');
INSERT INTO events values (0,4, 'c');
INSERT INTO events values (0,5, 'b');
INSERT INTO events values (0,6, 'a');
INSERT INTO events values (0,7, 'a');
INSERT INTO events values (0,8, 'd');

SELECT * FROM events;
+---------+------------+-------+
| user_id | created_at | event |
+---------+------------+-------+
|       0 |          0 | a     |
|       0 |          1 | b     |
|       0 |          2 | c     |
|       0 |          3 | a     |
|       0 |          4 | c     |
|       0 |          5 | b     |
|       0 |          6 | a     |
|       0 |          7 | a     |
|       0 |          8 | d     |
+---------+------------+-------+
9 rows in set (0.00 sec)

Here is the result I know how to get in both, but I cannot seem to be able to get the event info in it as well:

SELECT user_id, MAX(purchased) AS purchased, spent 
FROM ( 
    SELECT
        e1.user_id AS user_id, e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent
    FROM events e1, events e2
    WHERE
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
        e1.Event = 'a' AND e2.Event != 'a'
    GROUP BY e1.user_id, e1.created_at
) e3 GROUP BY user_id, spent;

 user_id | purchased | spent 
---------+-----------+-------
       0 |         0 |     1
       0 |         3 |     4
       0 |         7 |     8

Now if I want the event type in there as well, it does not work with the query above, because you either have to use the event field in the group-by (not what we want) or with an aggregate (not what we want either). Funny enough in mysql it works, but I consider that cheating and since I have to use vertica for this, it won't help me:

SELECT user_id, MAX(purchased) as purchased, spent, event FROM (
    SELECT 
        e1.User_ID AS user_id, 
        e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent, 
        e2.event AS event 
    FROM events e1, events e2 
    WHERE 
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND 
        e1.Event = 'a' AND e2.Event != 'a' 
    GROUP BY
        e1.user_id,e1.created_at
 ) e3 GROUP BY user_id, spent;


+---------+-----------+-------+-------+
| user_id | purchased | spent | event |
+---------+-----------+-------+-------+
|       0 |         0 |     1 | b     |
|       0 |         3 |     4 | c     |
|       0 |         7 |     8 | d     |
+---------+-----------+-------+-------+
3 rows in set (0.00 sec)

For vertica the same query throws an error: ERROR 2640: Column "e2.event" must appear in the GROUP BY clause or be used in an aggregate function

Whats an elegant solution to get the two events paired up with all their columns and without cheating so that I can get the same result as shown above when executing in vertica or some other database that does not allow the cheat? In the sample data I have exactly one additional column I want, which is the event type, but in the real world it would be two or three columns.

Please try it out with the sample data posted before answering :)

Michael Will
  • 105
  • 1
  • 6
  • I'm a little confused by what you want from the explanation, but the outputs kind of help. It sounds like you want the information about the event that immediately follows an "a" event that is not also an "a" event. If that is correct then "spent" is always going to be one more than "purchased". If this is true then I can write something up to get you your final output. – Eric J. Price Jan 10 '13 at 20:30
  • Ahh, it might not be one more if another user_id has information in between... ok, solution coming up. – Eric J. Price Jan 10 '13 at 20:32

3 Answers3

0

Ok, while I'm not 100% sure I understand what you're trying to do, see if this won't work:

SELECT e3.user_id, MAX(e3.purchased) AS purchased, e3.spent, e.event
FROM ( 
    SELECT
        e1.user_id AS user_id, e1.created_at AS purchased, 
        MIN(e2.created_at) AS spent
    FROM events e1, events e2
    WHERE
        e1.user_id = e2.user_id AND e1.created_at <= e2.created_at AND
        e1.Event = 'a' AND e2.Event != 'a'
    GROUP BY e1.user_id, e1.created_at
) e3 
 JOIN events e on e3.user_id = e.user_id and e3.spent = e.created_at
GROUP BY e3.user_id, e3.spent, e.event

Essentially I'm just joining on the event table again assuming user_id and created_at are your primary key.

And here is the SQL Fiddle.

wickedone
  • 542
  • 1
  • 6
  • 18
sgeddes
  • 62,311
  • 6
  • 61
  • 83
0

Try this...

With    Cte As
(
        Select  Row_Number() Over (Partition By [user_id] Order By [created_at]) As row_num,
                [user_id],
                [created_at],
                [event]
        From    [events]
)
Select  c1.[user_id],
        c1.[created_at] As purchased,
        c2.[created_at] As spent,
        c2.[event]
From    Cte c1
Left    Join Cte c2
        On  c1.row_num = c2.row_num - 1
Where   c1.event = 'a'
And     c2.event <> 'a'
Eric J. Price
  • 2,740
  • 1
  • 15
  • 21
  • Neither vertica nor mysql have 'With': http://stackoverflow.com/questions/324935/mysql-with-clause – Michael Will Jan 11 '13 at 21:37
  • I created a view cte and then it shows that your solution also works: select c1.user_id, c1.created_at as purchased, c2.created_at as spent, c2.event from cte c1 left join cte c2 on c1.row_num=c2.row_num - 1 where c1.event='a' and c2.event != 'a'; – Michael Will Jan 11 '13 at 21:43
0

I usually do the "next" calculations using correlated subqueries, and then join back to the original table. In this case, I am assuming that , uniquely defines a row.

Here is the query:

SELECT user_id, MAX(purchased) as purchased, spent, event
FROM (
    SELECT e.User_ID, e.created_at AS purchased, 
           MIN(enext.created_at) AS spent,
           min(enext.event) AS event 
    FROM (select e.*,
                 (select MIN(e2.created_at)
                  from event e2
                  where e2.user_id = e.user_id and e2.created_at > e.created_at and e2.event <> 'a'
                 ) nextcreatedat
          from events e
          where e.event = 'a'
         ) e left outer join
         events enext
         on e.user_id = enext.user_id and
            e.nextcreatedat = enext.create_at
    GROUP BY e.user_id, e.created_at
    ) e3
 GROUP BY user_id, spent;

The aggregation GROUP BY e.user_id, e.created_at is not necessary, but I've left it in to remain similar to the original query.

Because Vertica supports cumulative sums, there is a way to do this more efficiently, but it would not work in MySQL.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786