7

I have a table of financial data with the following schema:

         Table "public.candles"
   Column   |      Type      | Modifiers 
------------+----------------+-----------
 posix_time | bigint         | not null
 low        | numeric(8,2)   | not null
 high       | numeric(8,2)   | not null
 open       | numeric(8,2)   | not null
 close      | numeric(8,2)   | not null
 volume     | numeric(23,16) | not null
Indexes:
    "candles_pkey" PRIMARY KEY, btree (posix_time)

Each candle spans a one-minute interval. I would like to aggregate the data into candles spanning intervals of 5 minutes, 1 hour, 1 day, etc.

I can aggregate posix_time, high, low, and volume over five minute intervals with

SELECT posix_time/(60*5)*(60*5) AS new_posix_time,
       max(high)                AS new_high,
       min(low)                 AS new_low,
       sum(volume)              AS new_volume
FROM candles
GROUP BY new_posix_time

and calculate the the new open and close values with the appropriate variation of

SELECT posix_time/(60*5)*(60*5) AS new_posix_time,
       open                     AS new_open
FROM (SELECT open,
             posix_time,
             ROW_NUMBER() OVER (PARTITION BY posix_time/(60*5)*(60*5)
                                    ORDER BY posix_time ASC) AS r
      FROM candles
     ) AS o
WHERE o.r = 1

as suggested in this question, but I can't figure out how to combine them into one query.

Do I need to use joins? Subqueries? Totally restructure the query?

Community
  • 1
  • 1
Bryan
  • 193
  • 7

3 Answers3

1

You can use generate_series() to get the timeframe you are looking for. Then you can use left join and aggregation. Something like this:

select t.ts,
       min(low) as low, max(high) as high, sum(volume) as volume
from generate_series('2016-01-01'::timestamp, '2016-01-02'::timestamp, interval '5 minute'
                    ) t(ts) left join
     candles c
     on '1970-01-01' + c.posix_time * interval '1 second' between t.ts and t.ts + interval '5 minute'
group by t.ts;

EDIT:

Getting the open and close time requires one more level of processing:

select ts, min(low) as low, max(high) as high, sum(volume) as volume,
       min(open) as open, min(close) as close
from (select t.*, c.*,
             first_value(open) over (partition by t.ts order by c.posix_time asc) as open,
             first_value(open) over (partition by t.ts order by c.posix_time desc) as close
      from generate_series('2016-01-01'::timestamp, '2016-01-02'::timestamp, interval '5 minute'
                          ) t(ts) left join
           candles c
           on '1970-01-01' + c.posix_time * interval '1 second' between t.ts and t.ts + interval '5 minute'
     ) t
group by ts;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • It seems that this is just an alternative form of the first query. How would I use this to aggregate the `open` and `close` columns as well? – Bryan Jun 10 '16 at 00:00
0

I know you probably don't want to hear this, but using pandas will make your life a lot easier...

sqla = ("SELECT posix_time, open, high, low, close FROM table") 
df   = psql.read_sql(sqla, con)
Timeframe = '5T'
df.resample(Timeframe ).agg({ 'open' : 'first', 'high' : 'max', 'low'  : 'min', 'close': 'last'})
ajsp
  • 2,512
  • 22
  • 34
0

Here is my own solution using custom aggregate functions:

CREATE TYPE open_close_agg_type AS (
    t bigint,
    p numeric(8,2)
);

-- assumes price will never be 0
CREATE OR REPLACE
    FUNCTION close_state_func(open_close_agg_type, open_close_agg_type)
    RETURNS open_close_agg_type LANGUAGE sql
AS $$
    SELECT CASE WHEN ($1).p = 0 THEN $2
                WHEN ($2).p = 0 THEN $1
                WHEN ($1).t > ($2).t THEN $1
                ELSE $2
           END
$$;

-- assumes price will never be 0
CREATE OR REPLACE
    FUNCTION open_state_func(open_close_agg_type, open_close_agg_type)
    RETURNS open_close_agg_type LANGUAGE sql
AS $$
    SELECT CASE WHEN ($1).p = 0 THEN $2
                WHEN ($2).p = 0 THEN $1
                WHEN ($1).t < ($2).t THEN $1
                ELSE $2
           END
$$;

CREATE OR REPLACE
    FUNCTION open_close_agg_finalize_func(open_close_agg_type)
    RETURNS numeric(8,2) LANGUAGE sql
AS $$
    SELECT ($1).p
$$;

CREATE AGGREGATE last_closing(open_close_agg_type) (
    SFUNC=close_state_func,
    STYPE=open_close_agg_type,
    INITCOND='(0, 0.0)',
    FINALFUNC=open_close_agg_finalize_func
);

CREATE AGGREGATE first_opening(open_close_agg_type) (
    SFUNC=open_state_func,
    STYPE=open_close_agg_type,
    INITCOND='(0, 0.0)',
    FINALFUNC=open_close_agg_finalize_func
);

Run the following query:

SELECT posix_time/(60*5)*(60*5)          AS new_posix_time,
       min(low)                          AS new_low,
       max(high)                         AS new_high,
       first_opening((posix_time, open)) AS new_open,
       last_closing((posix_time, close)) AS new_close,
       sum(volume)                       AS new_volume
FROM candles
GROUP BY new_posix_time

I went with Gordon's answer because it's less verbose.

Bryan
  • 193
  • 7