Translate query with GROUP BY from MySQL to Postgres and SQLite

Question

In MySQL I use a lot of constructs which look like this:

LEFT JOIN(
    SELECT field_one, field_two, field_three FROM (
        SELECT field_one, field_two, field_three FROM table_one
        ORDER BY field_three 
    ) o GROUP BY field_one
) abc ON cur_table.some_field = abc.field_one

I use it, because it helps me to implement a usual task - to order some data before grouping. Now I wonder how to migrate this SQL construct from MySQL to SQLite and PostgreSQL.

Prepare http://sqlfiddle.com with sample data. It will be much easier to provide accurate answer — Lukasz Szozda, Jan 03 '16 at 20:20
I did not try it, because I've red somewhere that in Postgresql, for example, they do the similar thing in some other way (which looked rather ugly to me). So I thought that it is not standard. — Jacobian, Jan 03 '16 at 20:25
the `order by` in the derived table doesn't make sense and servers no practical purpose because the "order" of rows of an intermediate step in the query is irrelevant. Plus: `) o group by field_one` is invalid SQL (you can't "group" a table alias). Plus: the `group by` doesn't make any sense because you are not using any aggregates. I don't understand what you are trying to achieve with this. What is the underlying problem you are trying to solve? Why don't you just join those two tables directly? What is the reason for the useless `order by` and wrong `group by`? — , Jan 03 '16 at 20:30
@a_horse_with_no_name. Let's say that `field_three` is some auto_increment field, `field_one` is foreign key which points to the field `some_field` from the current table (which has alias `cur_table` in my example). And the real world task is to get some important data (from field_two), one per each `some_field` in the current table. The task may be to get the first or the last inserted data in `table_one`. Therefore, I indeed need `order by`, it indeed makes sense, and I indeed need grouping, and more than that I do not need any aggregates. — Jacobian, Jan 03 '16 at 20:38
@Jacobian No, it does not make sense. It is useless. check the query plan and you will see that is has been removed (so it is harmless, too) — wildplasser, Jan 03 '16 at 23:30

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

I suspect the purpose of your query is to use the MySQL quirk to pick an arbitrary row for un-aggregated columns that are also not listed in the existing GROUP BY clause (which is against the SQL standard and not supported by most other RDBMS). By ordering in a subquery you make MySQL pick the row with the smallest field_three per gropup, so I'll assume you want:

the smallest field_three for every field_one, and field_two from the same row to go with it.

Your original query does not work in Postgres which follows the SQL standard here. If a SELECT has a GROUP BY clause, all output columns have to be listed or be aggregated. Consider:

How do I port query with GROUP BY clause to PostgreSQL?

Standard SQL with window function

One possible standard-SQL solution would be with the window function row_number() in a subquery:

SELECT field_one, field_two, field_three
FROM  (
   SELECT field_one, field_two, field_three
        , row_number() OVER(PARTITION BY field_one ORDER BY field_three) rn
   FROM   table_one
   ) sub
WHERE sub.rn = 1

Works in Postgres, but not in SQLite or MySQL which do not support window functions, yet.

Basic standard SQL

This query works in all three RDBMS (and almost anywhere else, too), but requires unique maxima in field_three (returns multiple rows if there are ties for a maximum field_three per field_one).

SELECT t1.*
FROM   table_one t1
LEFT   JOIN table_one t2 ON t1.field_one = t2.field_one
                        AND t1.field_three < t2.field_three
WHERE  t2.field_one IS NULL

You can resolve ties if you have any unique (set of) column(s), but it's unwieldy. Related:

Fetch the row which has the Max value for a column

Postgres

(Besides supporting all standard SQL solutions) Postgres also has the powerful DISTINCT ON (an extension of the standard DISTINCT, but not against the standard like the MySQL and SQLite quirks):

SELECT DISTINCT ON (field_one)
       field_one, field_two, field_three
FROM   table_one
ORDER  BY field_one, field_three

You can resolve ties by adding more columns to the ORDER BY. Details:

Select first row in each GROUP BY group?

SQLite

... has a similar quirk as MySQL (violating the SQL standard). From the release notes:

Queries of the form: "SELECT max(x), y FROM table" returns the value of y on the same row that contains the maximum x value.

So:

SELECT field_one, field_two, max(field_three) AS field_three
FROM   table_one
GROUP  BY field_one

field_two is taken from the row of max(field_three). Related:

How can I select the set of rows where each item has the greatest timestamp?

Join and join condition work the same everywhere:

LEFT JOIN (SELECT ...) abc ON cur_table.some_field = abc.field_one

+1, and you may be able to beat the row_number() solution with an APPLY operation... but it's _still_ not supported by MySql. — Joel Coehoorn, Jan 04 '16 at 05:08
Thank you, sir! Does your last comment mean that the solution from my question will work both in `SQLite` and `Postgres` and probably in other databases, even though, as a_horse_with_no_name and wildplasser said, `group by` will be removed from the query plan? — Jacobian, Jan 04 '16 at 07:22
@Jacobian: Your original does not work in Postgres. Consider the added explanation above. — Erwin Brandstetter, Jan 04 '16 at 15:15

Translate query with GROUP BY from MySQL to Postgres and SQLite

1 Answers1

Standard SQL with window function

Basic standard SQL

Postgres

SQLite