how to remove DISTINCT from query, in Postgresql

Question

I have the following view:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT DISTINCT ON (coin) * 
from {accountId}.{tableAssetsName} 
ORDER BY coin, ts DESC;

The view returns, for each coin, the latest update.

and, in an unrelated question, a comment was:

Note: DISTINCT is always a red flag. (almost always)

I read about this and it seems to have a performance issue with Postgres. In my scenario, I don't have any issues with performance, but I would like to understand:

how could such a query be rewritten to not use DISTINCT then?

The performance issue doesn't comes from the `DISTINCT` clause, but from the underlying `ORDER BY` clause which is required so that to select the "rigth" data with the `DISTINCT` clause. If you face performance issue with the `ORDER BY` clause, then you can use a `MATERIALIZED` view with a specific index to make it faster. — Edouard, Nov 04 '21 at 09:58
Postgresql's `DISTINCT ON` and `DISTINCT` are quite different animals. There are a few ways to rewrite the query so that it emulates `DISTINCT ON` but they will almost certainly be less performant. [Here](https://stackoverflow.com/questions/9795660/postgresql-distinct-on-with-different-order-by) is a good SQ thread. — Stefanov.sm, Nov 04 '21 at 10:27

score 1 · Accepted Answer · answered Nov 04 '21 at 10:54

There is a substantial difference between DISTINCT and the Postgres proprietary DISTINCT ON () operator.

Does whoever wrote that "DISTINCT is always a red flag" actually know the difference between DISTINCT and DISTINCT ON in Postgres?

The problem that your view solves, is known as greatest-n-per-group and in Postgres distinct on is typically more efficient than the alternatives.

The problem can be solved with e.g. window functions:

CREATE OR REPLACE VIEW {accountId}.last_assets AS
SELECT ....
FROM (
  SELECT *, 
         row_number() over (partition by coin order by ts desc) as rn
  from {accountId}.{tableAssetsName}   
) t 
WHERE rn = 1;

But I wouldn't be surprised if that is actually slower than you current solution.

how to remove DISTINCT from query, in Postgresql

1 Answers1