How to SELECT DISTINCT of one column and get the others?

Question

I have a table like this (but with more columns):

Code	Quantity
00001	1
00002	1
00002	1
00002	2
00003	2
00003	1

And I want to get the same result as with SELECT DISTINCT Code FROM table (00001,00002,00003) but with all of the other table columns.

If I perform SELECT DISTINCT Code, Quantity from table I get:

Code	Quantity
00001	1
00002	1
00002	2
00003	1
00003	2

But I would like to get:

Code	Quantity
00001	1
00002	1
00003	1

score 49 · Accepted Answer · answered Jun 24 '13 at 13:56

49

Assuming you are using MySQL (as the question is tagged), the following will return an arbitrary value for the other columns:

select *
from t
group by code;

However, the particular values being selected come from indeterminate rows.

answered Jun 24 '13 at 13:56

Gordon Linoff

1,242,037
58
646
786

6

The question is for SQLite; the answer is the same regardless. – lilbyrdie Apr 28 '14 at 15:43

Ciro Santilli OurBigBook.com · Answer 2 · 2022-08-10T06:59:22.047

SQLite deterministic min/max

To also guarantee that you will get the row with minimal quantity deterministically as in the desired output, you can simply add it as min(quantity) to the select as in:

select *, min(quantity)
from t
group by code;

As mentioned at: row with max value per group - SQLite

SQLite docs guarantee that this works https://www.sqlite.org/lang_select.html#bareagg

Special processing occurs when the aggregate function is either min() or max(). Example:
SELECT a, b, max(c) FROM tab1 GROUP BY a;
When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.

This is an SQL extension and not portable across RDMSs.

PostgreSQL DISTINCT ON

In PostgreSQL 13.5 you can't GROUP by columns that are not either aggregates or PRIMARY: Select first row in each GROUP BY group?

But PostgreSQL has the SELECT DISTINCT ON extension which solves the use case nicely: https://www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT as it allows you to specify separately what needs to be distinct and what needs to be returned:

select distinct on (code) *
from t
group by code
order by code ASC, quantity ASC

This query would also deterministically pick the entries with lowest quantity, since order by is respected when choosing which column to pick.

SELECT DISTINCT ON was WONTFIXed in SQLite: https://code.djangoproject.com/ticket/22696 While this simple case can be achieved by both DBMSs, SELECT DISTINCT is simply more general than SQLite's magic min/max currently, e.g. it can handle multiple columns which SQLite says is not possible with its implementation.

RANK and ROW_NUMBER window functions work for both SQLite and PostgreSQL

This method is the most versatile I've seen so far, and the exact same code works on both SQLite and PostgresSQL:

SELECT *
FROM (
    SELECT
      ROW_NUMBER() OVER (
        PARTITION BY "name"
        ORDER BY "population" DESC
      ) AS "rnk",
      *
    FROM "table"
    WHERE "name" IN ('a', 'b', 'c')
  ) sub
WHERE "sub"."rnk" = 1
ORDER BY
  "sub"."name" ASC,
  "sub"."population" DESC

That exact same code works on both:

SQLite 3.34.0
PostgreSQL 14.3, e.g. it also answers the PostgreSQL: PostgreSQL - fetch the rows which have the Max value for a column in each GROUP BY group

Furthermore, we can easily modify that query to cover the following related use cases:

if you replace ROW_NUMBER() with RANK(), it returns all ties for the max if more than one row reaches the max
if you replace "sub"."rnk" = 1 with "sub"."rnk" <= n you can get the top n per group rather than just the top 1

We are required to do a subquery in our code, you can't WHERE "sub"."rnk" = 1 from inside the same query where you are doing ROW_NUMBER: Window functions partition and order without subquery

score 6 · Answer 3 · answered Mar 09 '21 at 14:13

6

Building up on Gordon's answer, you can order a sub-query so that the group by will always return the lowst quantity for each code.

select *
from (select * from t order by code desc, quantity asc)
group by code;

answered Mar 09 '21 at 14:13

Adrien

71
1
2

The subquery is not needed for SQLite which magically guarantees that `min` and `max` will return the matching column: https://stackoverflow.com/a/71924314/895245 – Ciro Santilli OurBigBook.com Apr 19 '22 at 11:32

How to SELECT DISTINCT of one column and get the others?

3 Answers3

Linked