0

I have the following view:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT DISTINCT ON (coin) * 
from {accountId}.{tableAssetsName} 
ORDER BY coin, ts DESC;

The view returns, for each coin, the latest update.

and, in an unrelated question, a comment was:

Note: DISTINCT is always a red flag. (almost always)

I read about this and it seems to have a performance issue with Postgres. In my scenario, I don't have any issues with performance, but I would like to understand:

how could such a query be rewritten to not use DISTINCT then?

Thomas
  • 10,933
  • 14
  • 65
  • 136
  • The performance issue doesn't comes from the `DISTINCT` clause, but from the underlying `ORDER BY` clause which is required so that to select the "rigth" data with the `DISTINCT` clause. If you face performance issue with the `ORDER BY` clause, then you can use a `MATERIALIZED` view with a specific index to make it faster. – Edouard Nov 04 '21 at 09:58
  • 3
    Postgresql's `DISTINCT ON` and `DISTINCT` are quite different animals. There are a few ways to rewrite the query so that it emulates `DISTINCT ON` but they will almost certainly be less performant. [Here](https://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by) is a good SQ thread. – Stefanov.sm Nov 04 '21 at 10:27

1 Answers1

1

There is a substantial difference between DISTINCT and the Postgres proprietary DISTINCT ON () operator.

Does whoever wrote that "DISTINCT is always a red flag" actually know the difference between DISTINCT and DISTINCT ON in Postgres?

The problem that your view solves, is known as and in Postgres distinct on is typically more efficient than the alternatives.

The problem can be solved with e.g. window functions:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT ....
FROM (
  SELECT *, 
         row_number() over (partition by coin order by ts desc) as rn
  from {accountId}.{tableAssetsName}   
) t 
WHERE rn = 1;

But I wouldn't be surprised if that is actually slower than you current solution.