9

I have a table like this:

Name   activity  time

user1  A1        12:00
user1  E3        12:01
user1  A2        12:02
user2  A1        10:05
user2  A2        10:06
user2  A3        10:07
user2  M6        10:07
user2  B1        10:08
user3  A1        14:15
user3  B2        14:20
user3  D1        14:25
user3  D2        14:30

Now, I need a result like this:

Name   activity  next_activity

user1  A2        NULL
user2  A3        B1
user3  A1        B2

I would like to check for every user the last activity from group A and what type of activity took place next from group B (activity from group B always takes place after activity from group A). Other types of activity are not interesting for me. I've tried to use the lead() function, but it hasn't worked.

How I can solve my problem?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
KolM
  • 101
  • 1
  • 1
  • 2
  • `activity from group B always takes place after activity from group A`. That means, there are never two consecutive "B" activities per user? Also: your sample data suggests that all activities per user always happen in sequence, i.e. A1 -> A2 -> A3, .. never A3 -> A2. Correct? And: Postgres version? – Erwin Brandstetter Jan 22 '17 at 23:25
  • yes :) you're right :) – KolM Jan 22 '17 at 23:30

2 Answers2

11

Your definition:

activity from group B always takes place after activity from group A.

.. logically implies that there is, per user, 0 or 1 B activity after 1 or more A activities. Never more than 1 B activities in sequence.

You can make it work with a single window function, DISTINCT ON and CASE, which should be the fastest way for few rows per user (also see below):

SELECT name
     , CASE WHEN a2 LIKE 'B%' THEN a1 ELSE a2 END AS activity
     , CASE WHEN a2 LIKE 'B%' THEN a2 END AS next_activity
FROM  (
   SELECT DISTINCT ON (name)
          name
        , lead(activity) OVER (PARTITION BY name ORDER BY time DESC) AS a1
        , activity AS a2
   FROM   t
   WHERE (activity LIKE 'A%' OR activity LIKE 'B%')
   ORDER  BY name, time DESC
   ) sub;

db<>fiddle here

An SQL CASE expression defaults to NULL if no ELSE branch is added, so I kept that short.

Assuming time is defined NOT NULL. Else, you might want to add NULLS LAST. Why?

(activity LIKE 'A%' OR activity LIKE 'B%') is more verbose than activity ~ '^[AB]', but typically faster in older versions of Postgres. About pattern matching:

Conditional window functions?

That's actually possible. You can combine the aggregate FILTER clause with the OVER clause of window functions. However:

  1. The FILTER clause itself can only work with values from the current row.

  2. More importantly, FILTER is not implemented for pure genuine functions like lead() or lag() (up to Postgres 13) - only for aggregate functions.

If you try:

lead(activity) FILTER (WHERE activity LIKE 'A%') OVER () AS activity

Postgres will tell you:

FILTER is not implemented for non-aggregate window functions

About FILTER:

Performance

For few users with few rows per user, pretty much any query is fast, even without index.

For many users and few rows per user, the first query above should be fastest. See:

For many rows per user, there are (potentially much) faster techniques, depending on details of your setup. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0
select      distinct on(name) name,activity,next_activity

from       (select name,activity,time
                  ,lead(activity) over (partition by name order by time) as next_activity

            from   t

            where  left(activity,1) in ('A','B')
            ) t

where       left(activity,1) = 'A'

order by    name,time desc
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88