SQLite deterministic min
/max
To also guarantee that you will get the row with minimal quantity deterministically as in the desired output, you can simply add it as min(quantity)
to the select as in:
select *, min(quantity)
from t
group by code;
As mentioned at: row with max value per group - SQLite
SQLite docs guarantee that this works https://www.sqlite.org/lang_select.html#bareagg
Special processing occurs when the aggregate function is either min() or max(). Example:
SELECT a, b, max(c) FROM tab1 GROUP BY a;
When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.
This is an SQL extension and not portable across RDMSs.
PostgreSQL DISTINCT ON
In PostgreSQL 13.5 you can't GROUP by columns that are not either aggregates or PRIMARY: Select first row in each GROUP BY group?
But PostgreSQL has the SELECT DISTINCT ON
extension which solves the use case nicely: https://www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT as it allows you to specify separately what needs to be distinct and what needs to be returned:
select distinct on (code) *
from t
group by code
order by code ASC, quantity ASC
This query would also deterministically pick the entries with lowest quantity, since order by
is respected when choosing which column to pick.
SELECT DISTINCT ON
was WONTFIXed in SQLite: https://code.djangoproject.com/ticket/22696 While this simple case can be achieved by both DBMSs, SELECT DISTINCT
is simply more general than SQLite's magic min/max currently, e.g. it can handle multiple columns which SQLite says is not possible with its implementation.
RANK
and ROW_NUMBER
window functions work for both SQLite and PostgreSQL
This method is the most versatile I've seen so far, and the exact same code works on both SQLite and PostgresSQL:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER (
PARTITION BY "name"
ORDER BY "population" DESC
) AS "rnk",
*
FROM "table"
WHERE "name" IN ('a', 'b', 'c')
) sub
WHERE "sub"."rnk" = 1
ORDER BY
"sub"."name" ASC,
"sub"."population" DESC
That exact same code works on both:
Furthermore, we can easily modify that query to cover the following related use cases:
- if you replace
ROW_NUMBER()
with RANK()
, it returns all ties for the max
if more than one row reaches the max
- if you replace
"sub"."rnk" = 1
with "sub"."rnk" <= n
you can get the top n per group rather than just the top 1
We are required to do a subquery in our code, you can't WHERE "sub"."rnk" = 1
from inside the same query where you are doing ROW_NUMBER
: Window functions partition and order without subquery