2

I have a table of orders with a column denoting whether it's a buy or a sell, with the rows typically ordered by timestamp. What I'd like to do is operate on groups of consecutive buys, plus their sell. e.g. B B S B S B B S -> (B B S) (B S) (B B S)

Example:

order_action |      timestamp      
-------------+---------------------
buy          | 2013-10-03 13:03:02
buy          | 2013-10-08 13:03:02
sell         | 2013-10-10 15:58:02
buy          | 2013-11-01 09:30:02
buy          | 2013-11-01 14:03:02
sell         | 2013-11-07 10:34:02
buy          | 2013-12-03 15:46:02
sell         | 2013-12-09 16:00:03
buy          | 2013-12-11 13:02:02
sell         | 2013-12-18 15:59:03

I'll be running an aggregation function in the end (the groups are so that I can exclude an entire group based on its sell order), so GROUP BY or partitioned windows seemed like the right way to go, but I can't figure out how to get this specific grouping.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Alex
  • 21
  • 2
  • 1
    If you have any attempts, please post your code – Barranka Jul 11 '14 at 23:53
  • Can you add a sample of the output you would like to see? – harmic Jul 11 '14 at 23:57
  • Will every group of buys only have one sell? – Bob Jul 12 '14 at 00:06
  • `What I'd like to do is operate on groups of consecutive buys...` What is `operate`? That is, what do you actually want to **do**? And what problem are you having with doing it? It's tricky suggesting solutions for undefined problems. – user2338816 Jul 12 '14 at 01:30
  • I've asked a similar question before, it may be helpful. http://stackoverflow.com/questions/18889056/finding-a-run-of-rows-from-an-ordered-result-set – Josh Smeaton Jul 12 '14 at 01:54

3 Answers3

4

This can be surprisingly simple with count() as window aggregate function:

Postgres 9.4 or later with aggregate FILTER:

SELECT *, count(*) FILTER (WHERE order_action = 'sell')
                   OVER (ORDER BY ts DESC) AS grp
FROM   orders
ORDER  BY ts;

Or with ascending grp numbers:

SELECT *, count(*) FILTER (WHERE order_action = 'sell')
          OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM   orders
ORDER  BY ts;

See:

Using ts instead as timestamp as column name to avoid reserved words as identifiers.

count() returns a running count with the default frame definition from the start of the frame (the whole table in this case) up to the (last peer of) the current row. The running count of sells groups your rows as requested.
I am ordering descending in the OVER clause to let each group end at a trailing "sell", not a leading "sell". This results in descending group numbers. But that should not matter, you just needed group numbers.
Duplicate timestamps would be a problem (in any case!).

One way for ascending group numbers: use a custom FRAME definition for the window function:

In any Postgres version

SELECT *, count(order_action = 'sell' OR NULL) OVER (ORDER BY ts DESC) AS grp
FROM   orders
ORDER  BY ts;
SELECT *, count(order_action = 'sell' OR NULL)
          OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM   orders;

count() only counts non-null values. The expression order_action = 'sell' OR NULL results in TRUE for 'sell' and NULL otherwise.

db<>fiddle here - demonstrating all.
Old sqlfiddle

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

I don't have PostgreSQL, so i tried this over on SQL Fiddle

with sells as (
  select
    rank() over w grp,
    lag(timestamp,1,'2000-01-01') over w sd,
    timestamp td
  from
    orders
  where
    order_action = 'sell'
  window w as (order by timestamp)
)
select
  s.grp,
  o.order_action,
  o.timestamp
from
  orders o
join
  sells s
    on o.timestamp > s.sd
    and o.timestamp <= s.td
order by o.timestamp

Let me know if this works for you. This was my first time using PostgreSQL and I like it.

Bob
  • 1,045
  • 8
  • 10
1

You can characterize the groups by counting the number of sells at or later than each row. You can do this with a cumulative sum to get the group that can then be used for aggregation. Here is an example:

select min(timestamp), max(timestamp), sum(case when order_action = 'buy' then 1 else 0 end) as buys
from (select o.*,
             sum(case when order_action = 'sell' then 1 else 0 end) over
                 (order by timestamp desc) as grp
      from orders o
     ) o
group by grp
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786