3

I have a table named trades for holding currency trading data with the following schema:

id        - uuid
timestamp - timestamp without time zone
price     - numeric

I would like to be able to query in a way that I can build a candle chart. For this I need the first price, the last price, the max price and the min price, grouped by time intervals. So far I have this:

CREATE FUNCTION ts_round( timestamptz, INT4 ) RETURNS TIMESTAMPTZ AS $$
SELECT 'epoch'::timestamptz
     + '1 second'::INTERVAL * ( $2 * ( extract( epoch FROM $1 )::INT4 / $2 ) );
$$ LANGUAGE SQL;

SELECT ts_round( timestamp, 300 ) AS interval_timestamp
     , max(price) AS max, min(price) AS min
FROM trades
GROUP BY interval_timestamp
ORDER BY interval_timestamp DESC

How do I get the first price and last price within these intervals?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Matt Beedle
  • 171
  • 1
  • 9
  • How are you defining `first price` and `last price`? By timestamp? – David Faber Jan 14 '15 at 23:19
  • I seriously doubt that your function does what you actually want. It's effectively *truncating* to *n*-second intervals, after *rounding* to full seconds. I suspect your really want to either truncate ***or*** round to *n*-second intervals. Right? – Erwin Brandstetter Jan 15 '15 at 00:07
  • 1
    possible duplicate of [Given time/interval to calculate open/high/low/close value in each grouped data](http://stackoverflow.com/questions/27399054/given-time-interval-to-calculate-open-high-low-close-value-in-each-grouped-data) – radar Jan 15 '15 at 03:10
  • Erwin, thanks for the feedback! Could you suggest how the function could be improved? Or should I create a new question for that? – Matt Beedle Jan 15 '15 at 07:24
  • @MattBeedle: What should it do exactly? – Erwin Brandstetter Jan 15 '15 at 08:02
  • I just need a way to group the trades into arbitrary time intervals to calculate min/max/first/last etc during that timeframe. I took the function from http://www.depesz.com/2010/10/22/grouping-data-into-time-ranges/. Perhaps I can get rid of the function completely? – Matt Beedle Jan 15 '15 at 09:10
  • The question that you linked to in your answer (http://stackoverflow.com/questions/27399054/given-time-interval-to-calculate-open-high-low-close-value-in-each-grouped-data/27399571#27399571) is exactly what I am trying to do, except with a the interval time being variable. – Matt Beedle Jan 15 '15 at 09:11
  • A much easier way to get the timestamp interval is using the floor function with a unix timestamp. This would return in n intervals (n is in minutes) floor(unix_timestamp/(n*60)) – Mitchell Leefers Mar 21 '22 at 02:59

3 Answers3

5

I think this is the query you want:

SELECT ts_round( timestamp, 300 ) AS interval_timestamp,
       max(firstprice) as firstprice,
       max(lastprice) as lastprice,
       max(price) AS maxprice, min(price) AS minprice
FROM (SELECT t.*,
             first_value(price) over (partition by ts_round(timestamp, 300) order by timestamp) as firstprice,
             first_value(price) over (partition by ts_round(timestamp, 300) order by timestamp desc) as lastprice
      FROM trades t
     ) t
GROUP BY interval_timestamp
ORDER BY interval_timestamp DESC;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • One can have a window function over the result of an aggregate function on the same query level, but not the other way round. While `min()` and `max()` can be used for either purpose, `first_value()` and `last_value()` are dedicated window functions. This is syntactical **nonsense**, you would need a subquery to make it work. Doesn't seem to keep people from voting for it, though. – Erwin Brandstetter Jan 15 '15 at 02:13
  • **Still incorrect** after your fix. The default frame definition for window frames is `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. In your subquery you effectively get the price of the *current* row as `lastprice`. After `max(lastprice)` in the outer query you effectively return `max(price)` a second time, which is nonsense. Still doesn't keep people from voting for it. – Erwin Brandstetter Jan 15 '15 at 03:32
  • Instead of calling two times first_value, you can call first_value and last_value on same window function. Query will be faster. – Mr Jedi Nov 24 '17 at 23:02
2

This uses a single window for all window functions and no subquery. Should be faster than the currently accepted answer.

SELECT DISTINCT ON (1)
       ts_round(timestamp, 300) AS interval_timestamp
     , min(price)         OVER w AS min_price
     , max(price)         OVER w AS max_price
     , first_value(price) OVER w AS first_price
     , last_value(price)  OVER w AS last_price
FROM   trades
WINDOW w AS (PARTITION BY ts_round(timestamp, 300) ORDER BY timestamp
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER  BY 1 DESC;

To define "first" and "last" per timestamp, this column needs to be unique or the query is ambiguous and yo get an arbitrary pick from equal peers.

Similar answer with explanation for the custom window frame:

Explanation for the reference by ordinal numbers:

Aside: don't use "timestamp" as identifier. It's a basic type name, which is error-prone.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

This can also be done without creating a confusing function and just using the built-in floor function with a unix timestamp.

SELECT symbol, timestamp AS interval_timestamp,
   max(firstprice) as firstprice,
   max(lastprice) as lastprice,
   max(price) AS maxprice, min(price) AS minprice, 
   max(vol)-min(vol) as volume
FROM (SELECT t.*,
first_value(price) over (partition by 
floor(unix_timestamp(timestamp)/(5*60)) order by timestamp) as 
firstprice,
first_value(price) over (partition by 
floor(unix_timestamp(timestamp)/(5*60)) order by timestamp desc) as 
lastprice
  FROM trades t
 ) t
GROUP BY floor(unix_timestamp(timestamp)/(5*60))
ORDER BY timestamp

Note here you are turning the timestamp into a unix timestamp (if you are storing unix timestmaps no need to convert) and then dividing that by number of minutes * 60. So, in this example we are returning 5 minutes intervals. I also added in the trade volume for the time interval because, why not?

Mitchell Leefers
  • 170
  • 1
  • 13