27

I have a table like this (but with more columns):

Code Quantity
00001 1
00002 1
00002 1
00002 2
00003 2
00003 1

And I want to get the same result as with SELECT DISTINCT Code FROM table (00001,00002,00003) but with all of the other table columns.

If I perform SELECT DISTINCT Code, Quantity from table I get:

Code Quantity
00001 1
00002 1
00002 2
00003 1
00003 2

But I would like to get:

Code Quantity
00001 1
00002 1
00003 1
user4157124
  • 2,809
  • 13
  • 27
  • 42
javiazo
  • 1,892
  • 4
  • 26
  • 41

3 Answers3

49

Assuming you are using MySQL (as the question is tagged), the following will return an arbitrary value for the other columns:

select *
from t
group by code;

However, the particular values being selected come from indeterminate rows.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
11

SQLite deterministic min/max

To also guarantee that you will get the row with minimal quantity deterministically as in the desired output, you can simply add it as min(quantity) to the select as in:

select *, min(quantity)
from t
group by code;

As mentioned at: row with max value per group - SQLite

SQLite docs guarantee that this works https://www.sqlite.org/lang_select.html#bareagg

Special processing occurs when the aggregate function is either min() or max(). Example:

SELECT a, b, max(c) FROM tab1 GROUP BY a;

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.

This is an SQL extension and not portable across RDMSs.

PostgreSQL DISTINCT ON

In PostgreSQL 13.5 you can't GROUP by columns that are not either aggregates or PRIMARY: Select first row in each GROUP BY group?

But PostgreSQL has the SELECT DISTINCT ON extension which solves the use case nicely: https://www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT as it allows you to specify separately what needs to be distinct and what needs to be returned:

select distinct on (code) *
from t
group by code
order by code ASC, quantity ASC

This query would also deterministically pick the entries with lowest quantity, since order by is respected when choosing which column to pick.

SELECT DISTINCT ON was WONTFIXed in SQLite: https://code.djangoproject.com/ticket/22696 While this simple case can be achieved by both DBMSs, SELECT DISTINCT is simply more general than SQLite's magic min/max currently, e.g. it can handle multiple columns which SQLite says is not possible with its implementation.

RANK and ROW_NUMBER window functions work for both SQLite and PostgreSQL

This method is the most versatile I've seen so far, and the exact same code works on both SQLite and PostgresSQL:

SELECT *
FROM (
    SELECT
      ROW_NUMBER() OVER (
        PARTITION BY "name"
        ORDER BY "population" DESC
      ) AS "rnk",
      *
    FROM "table"
    WHERE "name" IN ('a', 'b', 'c')
  ) sub
WHERE "sub"."rnk" = 1
ORDER BY
  "sub"."name" ASC,
  "sub"."population" DESC

That exact same code works on both:

Furthermore, we can easily modify that query to cover the following related use cases:

  • if you replace ROW_NUMBER() with RANK(), it returns all ties for the max if more than one row reaches the max
  • if you replace "sub"."rnk" = 1 with "sub"."rnk" <= n you can get the top n per group rather than just the top 1

We are required to do a subquery in our code, you can't WHERE "sub"."rnk" = 1 from inside the same query where you are doing ROW_NUMBER: Window functions partition and order without subquery

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
6

Building up on Gordon's answer, you can order a sub-query so that the group by will always return the lowst quantity for each code.

select *
from (select * from t order by code desc, quantity asc)
group by code;
Adrien
  • 71
  • 1
  • 2