1

I'm trying to do a distinct operation on OHLC data where I have multiple dates per symbol

I can do the operation itself just fine, it only returns date, and symbol

select distinct timestamp, symbol from mv_qs_facts group by mv_qs_facts.symbol, mv_qs_facts.timestamp;

but I'd like it to return all columns (additional: close, open, high, low, volume) as well.

My goal is to return the last distinct (timestamp, symbol)

an idea I had.

select distinct on (timestamp, symbol), close, open, high, low from mv_qs_facts group by mv_qs_facts.symbol, mv_qs_facts.timestamp;

I see it's not as easy as this statement.

I've read I might be able to solve it with a subquery, a temporary table, or a join (all which don't use distinct).

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
thistleknot
  • 1,098
  • 16
  • 38

3 Answers3

2

Use DISTINCT ON ():

SELECT DISTINCT ON (timestamp, symbol)
   timestamp, symbol, close, open, high, low
FROM mv_qs_facts;

This will return close, open, high and low for a random member of the group.

If you want to control which member is used, add an ORDER BY clause, then the first member in this ordering is taken.

If the problem is memory consumption on the client, you should use cursors:

BEGIN;
DECLARE c CURSOR FOR SELECT ...;
FETCH 100 FROM c;
FETCH 100 FROM c;
...
COMMIT;
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Thank you. I have over 2 million records and this runs out of memory. :...( my hope was to do this in psql vs r to try to avoid memory issues but I see I'm still having them. – thistleknot Sep 04 '19 at 02:53
  • I lowered it to 1 same result. Is there a way to partition/group by symbol and then do the operation – thistleknot Sep 04 '19 at 13:22
  • That seems unlikely. Please explain in detail: what component is running out of memory? What is the error message? Is there a memory context dump in the PostgreSQL log? – Laurenz Albe Sep 04 '19 at 13:37
  • ERROR: out of memory for query result – thistleknot Sep 04 '19 at 14:40
  • The keyword is "for query result". I bet it is the client that has problems keeping the result in memory. Try using a cursor and fetch only a couple of results at a time. – Laurenz Albe Sep 04 '19 at 14:51
  • thank you for that advice. I did a create table as [then your query] – thistleknot Sep 04 '19 at 15:55
  • CREATE materialized view temp AS SELECT DISTINCT ON (timestamp, symbol) id, timestamp, symbol, close, open, high, low FROM qs_facts ORDER BY id desc; produces error: "SELECT DISTINCT ON expressions must match initial ORDER BY expressions". So this doesn't allow me to select the last distinct id per symbol, timestamp – thistleknot Sep 04 '19 at 23:23
  • got it! CREATE materialized view temp AS SELECT DISTINCT ON (symbol, timestamp) id, timestamp, symbol, close, open, high, low FROM qs_facts order by symbol, timestamp, id desc; thanks @RhodiumToad on #postgresql on irc.freenode.net – thistleknot Sep 05 '19 at 01:02
0

Here's your query.

select distinct t1.* from (
select row_number() over (partition by symbol order by timestamp desc) as rn, * from 
mv_qs_facts) as t1
    where t1.rn = 1
Ed Bangga
  • 12,879
  • 4
  • 16
  • 30
  • error "select row_number() over (partition by s.symbol, order by ti..." with order by... I was able to run the subquery (or whatever it's called) without order by and s.symbol (i.e. just symbol) as such "select row_number() over (partition by symbol) as rn, timestamp, symbol from mv_qs_facts". The whole command works "select t1.symbol, t1.timestamp from ( select row_number() over (partition by symbol) as rn, timestamp, symbol from mv_qs_facts) as t1 where t1.rn = 1" checking if it works now. It still only returns symbol and timestamp. Which my original command did – thistleknot Sep 02 '19 at 02:21
  • updated, removed the comma in between partition and order by – Ed Bangga Sep 02 '19 at 02:23
  • added order by to get only the latest timestamp, but this one depends to you. – Ed Bangga Sep 02 '19 at 02:24
  • the query runs. But only returns 2 col symbol, and timestamp. My original query "select distinct timestamp, symbol from mv_qs_facts group by mv_qs_facts.symbol, mv_qs_facts.timestamp;" does the same. What I want to do is return open, high, low, close, volume columns as well. I suspect I can add those columns in somewhere? – thistleknot Sep 02 '19 at 02:27
  • i think you need to join again the table on its ID to get other columns. – Ed Bangga Sep 02 '19 at 02:29
  • well that was my question. – thistleknot Sep 02 '19 at 02:31
  • updated the answer. please try replace 'id' to your actual column – Ed Bangga Sep 02 '19 at 02:31
  • I might have a major/minor issue. I set my key to timestamp, date and I find I have duplicate dates... in other words I don't have an autonumber id... which may or may not be my issue. I don't know I'd join without that. on a side note. I did find this: https://stackoverflow.com/questions/6127338/sql-mysql-select-distinct-unique-but-return-all-columns by garret simpson that looks like it would run if I had id as well – thistleknot Sep 02 '19 at 02:37
  • got it, you don't need to join. you can select full columns. updated the answer – Ed Bangga Sep 02 '19 at 02:40
  • that returned unique symbols... vs timestamp, symbol combinations. Truncating down to 1 date per symbol. It did return all columns though. Sorry took me so long to respond, I was reworking my table to have id's... which didn't exactly work. But I don't think that was the issue (my new id column, as they are all nulls) – thistleknot Sep 02 '19 at 04:31
  • I fixed my table to now have an id column. So whichever method works best. I am about to try this once more – thistleknot Sep 02 '19 at 08:47
  • It's still returning just 1 symbol and 1 date per symbol which isnt what I wanted. I need it to return all distinction dates per symbol – thistleknot Sep 02 '19 at 19:41
0

I added id because I initially had timestamp and date as my composite key but it turned out to be bad with duplicate dates and I needed something to reference to sort by

CREATE materialized view temp AS  
SELECT DISTINCT ON (symbol, timestamp)
   id, timestamp, symbol, close, open, high, low
FROM qs_facts order by symbol, timestamp, id desc;
thistleknot
  • 1,098
  • 16
  • 38