Grouping based on sequence of rows

Question

I have a table of orders with a column denoting whether it's a buy or a sell, with the rows typically ordered by timestamp. What I'd like to do is operate on groups of consecutive buys, plus their sell. e.g. B B S B S B B S -> (B B S) (B S) (B B S)

Example:

order_action |      timestamp      
-------------+---------------------
buy          | 2013-10-03 13:03:02
buy          | 2013-10-08 13:03:02
sell         | 2013-10-10 15:58:02
buy          | 2013-11-01 09:30:02
buy          | 2013-11-01 14:03:02
sell         | 2013-11-07 10:34:02
buy          | 2013-12-03 15:46:02
sell         | 2013-12-09 16:00:03
buy          | 2013-12-11 13:02:02
sell         | 2013-12-18 15:59:03

I'll be running an aggregation function in the end (the groups are so that I can exclude an entire group based on its sell order), so GROUP BY or partitioned windows seemed like the right way to go, but I can't figure out how to get this specific grouping.

`What I'd like to do is operate on groups of consecutive buys...` What is `operate`? That is, what do you actually want to **do**? And what problem are you having with doing it? It's tricky suggesting solutions for undefined problems. — user2338816, Jul 12 '14 at 01:30
I've asked a similar question before, it may be helpful. http://stackoverflow.com/questions/18889056/finding-a-run-of-rows-from-an-ordered-result-set — Josh Smeaton, Jul 12 '14 at 01:54

Erwin Brandstetter · Answer 1 · 2020-11-25T17:54:53.900

This can be surprisingly simple with count() as window aggregate function:

Postgres 9.4 or later with aggregate `FILTER`:

SELECT *, count(*) FILTER (WHERE order_action = 'sell')
                   OVER (ORDER BY ts DESC) AS grp
FROM   orders
ORDER  BY ts;

Or with ascending grp numbers:

SELECT *, count(*) FILTER (WHERE order_action = 'sell')
          OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM   orders
ORDER  BY ts;

See:

Aggregate columns with additional (distinct) filters

Using ts instead as timestamp as column name to avoid reserved words as identifiers.

count() returns a running count with the default frame definition from the start of the frame (the whole table in this case) up to the (last peer of) the current row. The running count of sells groups your rows as requested.
I am ordering descending in the OVER clause to let each group end at a trailing "sell", not a leading "sell". This results in descending group numbers. But that should not matter, you just needed group numbers.
Duplicate timestamps would be a problem (in any case!).

One way for ascending group numbers: use a custom FRAME definition for the window function:

In any Postgres version

SELECT *, count(order_action = 'sell' OR NULL) OVER (ORDER BY ts DESC) AS grp
FROM   orders
ORDER  BY ts;

SELECT *, count(order_action = 'sell' OR NULL)
          OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM   orders;

count() only counts non-null values. The expression order_action = 'sell' OR NULL results in TRUE for 'sell' and NULL otherwise.

db<>fiddle here - demonstrating all.
_{Old sqlfiddle}

Bob · Answer 2 · 2014-07-12T01:00:41.057

I don't have PostgreSQL, so i tried this over on SQL Fiddle

with sells as (
  select
    rank() over w grp,
    lag(timestamp,1,'2000-01-01') over w sd,
    timestamp td
  from
    orders
  where
    order_action = 'sell'
  window w as (order by timestamp)
)
select
  s.grp,
  o.order_action,
  o.timestamp
from
  orders o
join
  sells s
    on o.timestamp > s.sd
    and o.timestamp <= s.td
order by o.timestamp

Let me know if this works for you. This was my first time using PostgreSQL and I like it.

Gordon Linoff · Answer 3 · 2014-07-12T02:10:03.687

You can characterize the groups by counting the number of sells at or later than each row. You can do this with a cumulative sum to get the group that can then be used for aggregation. Here is an example:

select min(timestamp), max(timestamp), sum(case when order_action = 'buy' then 1 else 0 end) as buys
from (select o.*,
             sum(case when order_action = 'sell' then 1 else 0 end) over
                 (order by timestamp desc) as grp
      from orders o
     ) o
group by grp

Grouping based on sequence of rows

3 Answers3

Postgres 9.4 or later with aggregate `FILTER`:

In any Postgres version

Linked

Grouping based on sequence of rows

3 Answers3

Postgres 9.4 or later with aggregate FILTER:

In any Postgres version

Linked

Postgres 9.4 or later with aggregate `FILTER`: