0

I have this Postgres SQL query in which I would like to add DISTINCT:

SELECT pairs, a.change_id, user_size, user_mile, b.change_short_name
FROM order_data a
FULL OUTER JOIN changes b
ON a.change_id = b.change_id
ORDER BY a.created_at ASC;

I tried this:

SELECT DISTINCT pairs, a.change_id, user_size, user_mile, b.change_short_name
FROM order_data a
FULL OUTER JOIN changes b
ON a.change_id = b.change_id
ORDER BY a.created_at ASC;

I get error: [42P10] ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list Position: 280

I need to use DISTINCT for pairs, a.change_id and biggest a.created_at.

What is the proper way to implement this?

user1285928
  • 1,328
  • 29
  • 98
  • 147
  • Show us some sample table data and the expected result - all as formatted text (not images.) [mcve]! – jarlh Oct 11 '21 at 12:20
  • You say: `I need to use DISTINCT for pairs, a.change_id` And you are sure you didn't really mean `DISTINCT for pairs, COALESCE(a.change_id, b.change_id)`? Else the `FULL JOIN` doesn't seem to make a lot of sense, and a `LEFT JOIN` would seem more appropriate. Also, does the sort order of the result matter? Or do you just want to pick the oldest row from each set of peers? Please table-qualify all columns and (always) declare your version of Postgres. – Erwin Brandstetter Oct 11 '21 at 12:43
  • The core issue of your question is this: `I would like to add DISTINCT:`. But we'd rather need to know what you really want to achieve. I don't suppose you just fell in love with the `DISTINCT` key word. – Erwin Brandstetter Oct 11 '21 at 12:47
  • I have a table who stores several versions of data of temperature measurement. The idea is to execute the query and if we have different data to return result. – user1285928 Oct 11 '21 at 12:49

3 Answers3

2

Do a GROUP BY instead of SELECT DISTINCT. Use MAX() in the ORDER BY clause:

SELECT pairs, a.change_id, user_size, user_mile, b.change_short_name
FROM order_data a
FULL OUTER JOIN changes b
  ON a.change_id = b.change_id
GROUP BY pairs, a.change_id, user_size, user_mile, b.change_short_name
ORDER BY MAX(a.created_at) ASC;
jarlh
  • 42,561
  • 8
  • 45
  • 63
1

Use DISTINCT ON. Pls. note the ORDER BY clause. Some good examples/discussion here.

SELECT DISTINCT ON (pairs, a.change_id)
    pairs, a.change_id, user_size, user_mile, b.change_short_name
FROM order_data a
FULL OUTER JOIN changes b ON a.change_id = b.change_id
ORDER BY pairs, a.change_id, a.created_at DESC;
Stefanov.sm
  • 11,215
  • 2
  • 21
  • 21
1

Maybe this:

SELECT pairs, change_id, user_size, user_mile, change_short_name
FROM  (
   SELECT DISINCT ON (pairs, change_id)
          pairs, change_id, user_size, user_mile, b.change_short_name, a.created_at
   FROM   order_data a
   FULL   JOIN changes b USING (change_id)
   ORDER  BY pairs, change_id, a.created_at
ORDER  BY created_at;

Note the USING clause which merges a.change_id and b.change_id to effectively COALESCE(a.change_id, b.change_id).

The outer SELECT is to eliminate created_at from the result while still sorting by it. It's not entirely clear from the question whether you want that. Your original query does it.

Details depend on undisclosed information: exact table definition, cardinalities, exact requirements. There may be much faster solutions for many duplicates.

See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228