259
ID   FirstName   LastName
1      John        Doe
2      Bugs        Bunny
3      John        Johnson

I want to select DISTINCT results from the FirstName column, but I need the corresponding ID and LastName.

The result set needs to show only one John, but with an ID of 1 and a LastName of Doe.

xkeshav
  • 53,360
  • 44
  • 177
  • 245
m r
  • 2,879
  • 3
  • 17
  • 10
  • 1
    You want the last name belonging to the lowest ID with a distinct first name? – Thomas Langston May 11 '11 at 15:57
  • 3
    What is the logic that should go into the selection of the top one? I would think you would want both John Doe and John Johnson to show up since they are two distinct Johns but that is just me. – judda May 11 '11 at 16:00
  • 4
    `DISTINCT` is not a function. All answers with `DISTINCT()` are wrong. The error will show up when you do not place it after `SELECT`. – Question Overflow Feb 23 '14 at 07:46
  • 1
    `ALL` *answers* using parentheses after the word distinct are indeed wrong. **Distinct is NOT a function** so it cannot accept a parameter. The parentheses following distinct are simply ignored. Unless you are using PostgreSQL where the parentheses will form a "complex data type" – Paul Maxwell Feb 14 '16 at 02:14

12 Answers12

255

try this query

 SELECT ID, FirstName, LastName FROM table GROUP BY(FirstName)
xkeshav
  • 53,360
  • 44
  • 177
  • 245
  • 26
    How do we know which row will be returned? – William Entriken May 05 '13 at 16:51
  • 41
    @Full Decent you can't, according to MySQL [documentation](http://dev.mysql.com/doc/refman/5.1/en/group-by-extensions.html): "The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.". In practice I've successfully used this kind of queries with ORDER BY clause, for instance you could add ORDER BY id ASC/DESC and MySQL would return consistent results every time you would execute the query. But i would be sure whether anyone should use undocumented features in production environment. – Arunas Junevicius Jun 03 '13 at 14:54
  • 2
    O.P. doesn't mention mysql version. – xkeshav Jun 11 '13 at 04:43
  • I've been searching a lot regarding this issue in MySQL 5.7 and nothing is coming up, seriously! – sinaza Nov 06 '16 at 08:35
  • 2
    @sinaza see my updated answer for MySQL `5.7.5+` for the changed [`GROUP BY` handling](https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html) – Will B. Aug 14 '17 at 23:05
  • 1
    Attention! This is very bad solution for big data. Because GROUP BY sorts through all the lines, but selected only 2-3 rows. – frops Dec 08 '17 at 10:52
  • 10
    This doesn't work with only_full_group_by mode because neither ID nor LastName are neither aggregated nor part of the grouping function. Help! – ihodonald May 07 '19 at 01:06
  • 1
    https://stackoverflow.com/a/58808461/1770571 This is so helpful – Salma Gomaa Jun 03 '20 at 07:41
  • this isn't a good idea if you don't know, only MySql allow this, whereas other SQL languages will throw an error because that is the correct behavior! – shamaseen Apr 06 '21 at 17:23
  • OP ask about MySQL only not for other – xkeshav Apr 07 '21 at 01:12
  • it really works – Alex Jun 06 '23 at 21:42
98

To avoid potentially unexpected results when using GROUP BY without an aggregate function, as is used in the accepted answer, because MySQL is free to retrieve ANY value within the data set being grouped when not using an aggregate function [sic] and issues with ONLY_FULL_GROUP_BY. Please consider using an exclusion join.

Exclusion Join - Unambiguous Entities

Assuming the firstname and lastname are uniquely indexed (unambiguous), an alternative to GROUP BY is to sort using a LEFT JOIN to filter the result set, otherwise known as an exclusion JOIN.

See Demonstration

Ascending order (A-Z)

To retrieve the distinct firstname ordered by lastname from A-Z

Query

SELECT t1.*
FROM table_name AS t1
LEFT JOIN table_name AS t2
ON t1.firstname = t2.firstname
AND t1.lastname > t2.lastname
WHERE t2.id IS NULL;

Result

| id | firstname | lastname |
|----|-----------|----------|
|  2 |      Bugs |    Bunny |
|  1 |      John |      Doe |

Descending order (Z-A)

To retrieve the distinct firstname ordered by lastname from Z-A

Query

SELECT t1.*
FROM table_name AS t1
LEFT JOIN table_name AS t2
ON t1.firstname = t2.firstname
AND t1.lastname < t2.lastname
WHERE t2.id IS NULL;

Result

| id | firstname | lastname |
|----|-----------|----------|
|  2 |      Bugs |    Bunny |
|  3 |      John |  Johnson |

You can then order the resulting data as desired.


Exclusion Join - Ambiguous Entities

If the first and last name combination are not unique (ambiguous) and you have multiple rows of the same values, you can filter the result set by including an OR condition on the JOIN criteria to also filter by id.

See Demonstration

table_name data

(1, 'John', 'Doe'),
(2, 'Bugs', 'Bunny'),
(3, 'John', 'Johnson'),
(4, 'John', 'Doe'),
(5, 'John', 'Johnson')

Query

SELECT t1.*
FROM table_name AS t1
LEFT JOIN table_name AS t2
ON t1.firstname = t2.firstname
AND (t1.lastname > t2.lastname
OR (t1.firstname = t1.firstname AND t1.lastname = t2.lastname AND t1.id > t2.id))
WHERE t2.id IS NULL;

Result

| id | firstname | lastname |
|----|-----------|----------|
|  1 |      John |      Doe |
|  2 |      Bugs |    Bunny |

Composite IN() Subquery - Unambiguous Entities

For larger datasets, using an exclusion join can be very slow. If you have unambiguous entries, an alternative is to use a Composite IN() criteria against a MIN/MAX aggregate subquery.

Demonstration

Ascending Order (A-Z)

Query

SELECT t1.*
FROM table_name AS t1
WHERE (t1.firstname, t1.lastname) IN(
   SELECT firstname, MIN(lastname)
   FROM table_name
   GROUP BY firstname
)

Result

| id | firstname | lastname |
|----|-----------|----------|
|  2 |      Bugs |    Bunny |
|  1 |      John |      Doe |

Descending Order (Z-A)

Query

SELECT t1.*
FROM table_name AS t1
WHERE (t1.firstname, t1.lastname) IN(
   SELECT firstname, MAX(lastname)
   FROM table_name
   GROUP BY firstname
)

Result

| id | firstname | lastname |
|----|-----------|----------|
|  2 |      Bugs |    Bunny |
|  3 |      John |  Johnson |

Composite IN() Dependent Subquery - Ambiguous Entities

The same theory can be applied from the Ambiguous Exclusion join to the composite IN() subquery method, by adding a dependent subquery with MIN/MAX on the id column.

Demonstration

Query

SELECT t1.*
FROM table_name AS t1
WHERE t1.id IN(
   SELECT MIN(id)
   FROM table_name
   WHERE (t1.firstname, t1.lastname) IN(
      SELECT firstname, MIN(lastname)
      FROM table_name
      GROUP BY firstname
   )
   GROUP BY firstname, lastname
);

Result

| id | firstname | lastname |
|----|-----------|----------|
|  1 |      John |      Doe |
|  2 |      Bugs |    Bunny |

Ordered Subquery

EDIT

My original answer using an ordered subquery, was written prior to MySQL 5.7.5, which is no longer applicable, due to the changes with ONLY_FULL_GROUP_BY. Please use the one of the examples above instead.

It is also important to note; when ONLY_FULL_GROUP_BY is disabled (original behavior prior to MySQL 5.7.5), the use of GROUP BY without an aggregate function may yield unexpected results, because MySQL is free to choose ANY value within the data set being grouped [sic].

Meaning an ID or lastname value may be retrieved that is not associated with the retrieved firstname row.


WARNING

With MySQL GROUP BY may not yield the expected results when used with ORDER BY

See Test Case Example

The best method of implementation, to ensure expected results, is to filter the result set scope using an ordered subquery.

table_name data

(1, 'John', 'Doe'),
(2, 'Bugs', 'Bunny'),
(3, 'John', 'Johnson')

Query

SELECT * FROM (
    SELECT * FROM table_name ORDER BY ID DESC
) AS t1
GROUP BY FirstName

Result (MySQL 5.6)

| ID | first |    last |
|----|-------|---------|
|  2 |  Bugs |   Bunny |
|  3 |  John | Johnson |

Comparison

To demonstrate the unexpected results when using GROUP BY in combination with ORDER BY

Query

SELECT * FROM table_name GROUP BY FirstName ORDER BY ID DESC

Result (MySQL 5.6)

| ID | first |  last |
|----|-------|-------|
|  2 |  Bugs | Bunny |
|  1 |  John |   Doe |
Will B.
  • 17,883
  • 4
  • 67
  • 69
  • 3
    Most complete answer by far. Changing 'ID desc' to 'ID asc' in the first query allow us to retrieve either 'John Doe' or 'John Johnson'. Changing 'ID desc' in second query does'nt have this effect. – carla Jun 19 '15 at 15:44
  • On postgres you need ID in group by not sure of mysql. – Sachin Prasad Aug 20 '15 at 14:04
  • Will a GROUP BY column-A ORDER BY column-B in one SELECT statement always work correctly with latest version of MyriaDB? – Neal Davis Aug 07 '18 at 22:43
  • @NealDavis As per **MariaDB** manual `Ordering is done after grouping.`, so No not in this use-case, in addition [MariaDB ignores ORDER BY in subqueries](https://mariadb.com/kb/en/library/why-is-order-by-in-a-from-subquery-ignored/) (as per the SQL standard) without a `LIMIT`. You would want to use a [`Window Function`](https://mariadb.com/kb/en/library/window-functions/) For more clarification you should ask your question in the [DBA stackexchange](https://dba.stackexchange.com), as this is a question relating to MySQL – Will B. Aug 07 '18 at 22:57
  • To be clear, is it safe to use a subquery with `order by` and then `group by`, as in your section labeled `WARNING`? – NateS Dec 23 '18 at 16:16
  • 1
    @NateS No, the `GROUP BY` can select any value within the grouped data set, unless an aggregate function is used on those columns to force a specific value. So `lastname` or `id` can come from any of the ordered rows. The original subquery example was acceptable by default in `MySQL <= 5.7.4` but technically still suffers from the issue. While the `ORDER BY` does help to prevent a random selection, it is still theoretically possible, but with a significantly less probability than without using the `ORDER BY` subquery. – Will B. Dec 24 '18 at 06:59
80

The DISTINCT keyword doesn't really work the way you're expecting it to. When you use SELECT DISTINCT col1, col2, col3 you are in fact selecting all unique {col1, col2, col3} tuples.

Brian Driscoll
  • 19,373
  • 3
  • 46
  • 65
  • 20
    Thanks for pointing this out Brian. Can you provide an example of how I could utilize GROUP BY to obtain the same results? – m r May 11 '11 at 16:20
24
SELECT ID,LastName 
From TABLE_NAME 
GROUP BY FirstName 
HAVING COUNT(*) >=1
Taryn
  • 242,637
  • 56
  • 362
  • 405
sarath
  • 257
  • 2
  • 3
7

How about

`SELECT 
    my_distinct_column,
    max(col1),
    max(col2),
    max(col3)
    ...
 FROM
    my_table 
 GROUP BY 
    my_distinct_column`
onlinebaba
  • 81
  • 1
  • 2
  • 1
    The issue with using an aggregate function on multiple columns from a single column grouping, is that `MAX()` will retrieve the highest value within the groupings, instead of the corresponding row values of the distinct column that the OP is wanting. [Example](https://www.db-fiddle.com/f/wJ8gtnSntvKcv31mgWWY9o/0) expects `5, John, Johnson` but receives `6, John, Johnson` where `ID 6` is `John, Doe` – Will B. May 02 '21 at 00:12
  • I upvoted the answer but I didn't see that I got wrong results like @Will says. This way you get 'fake' rows with a mix of values from every group. Change MAX() to SUM() to understand it better. (I can't uncheck the upvote) – Ivan Ferrer Villa Jun 09 '21 at 10:04
3
SELECT firstName, ID, LastName from tableName GROUP BY firstName
Nanhe Kumar
  • 15,498
  • 5
  • 79
  • 71
3

As pointed out by fyrye, the accepted answer pertains to older versions of MySQL in which ONLY_FULL_GROUP_BY had not yet been introduced. With MySQL 8.0.17 (used in this example), unless you disable ONLY_FULL_GROUP_BY you would get the following error message:

mysql> SELECT id, firstName, lastName FROM table_name GROUP BY firstName;

ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'mydatabase.table_name.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

One way to work around this not mentioned by fyrye, but described in https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html, is to apply the ANY_VALUE() function to the columns which are not in the GROUP BY clause (id and lastName in this example):

mysql> SELECT ANY_VALUE(id) as id, firstName, ANY_VALUE(lastName) as lastName FROM table_name GROUP BY firstName;
+----+-----------+----------+
| id | firstName | lastName |
+----+-----------+----------+
|  1 | John      | Doe      |
|  2 | Bugs      | Bunny    |
+----+-----------+----------+
2 rows in set (0.01 sec)

As written in the aforementioned docs,

In this case, MySQL ignores the nondeterminism of address values within each name group and accepts the query. This may be useful if you simply do not care which value of a nonaggregated column is chosen for each group. ANY_VALUE() is not an aggregate function, unlike functions such as SUM() or COUNT(). It simply acts to suppress the test for nondeterminism.

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526
  • For clarification, I specifically avoided suggesting to use `ANY_VALUE()` as my answer and comments are focused on preventing ambiguous and unpredictable result-sets. Since as the function name suggests, it could result in any value from the selected rows being retrieved. I would suggest using `MAX` or `MIN` instead. – Will B. Oct 29 '19 at 17:04
2

Not sure if you can do this with MySQL, but you can use a CTE in T-SQL

; WITH tmpPeople AS (
 SELECT 
   DISTINCT(FirstName),
   MIN(Id)      
 FROM People
)
SELECT
 tP.Id,
 tP.FirstName,
 P.LastName
FROM tmpPeople tP
JOIN People P ON tP.Id = P.Id

Otherwise you might have to use a temporary table.

Thomas Langston
  • 3,743
  • 1
  • 25
  • 41
1

Keep in mind when using the group by and order by that MySQL is the ONLY database that allows for columns to be used in the group by and/or order by piece that are not part of the select statement.

So for example: select column1 from table group by column2 order by column3

That will not fly in other databases like Postgres, Oracle, MSSQL, etc. You would have to do the following in those databases

select column1, column2, column3 from table group by column2 order by column3

Just some info in case you ever migrate your current code to another database or start working in another database and try to reuse code.

-2

You can use group by for display distinct values and also corresponding fields.

select * from tabel_name group by FirstName

Now you got output like this:

ID    FirstName     LastName
2     Bugs          Bunny
1     John          Doe


If you want to answer like

ID    FirstName     LastName
1     John          Doe
2     Bugs          Bunny

then use this query,

select * from table_name group by FirstName order by ID
Alive to die - Anant
  • 70,531
  • 10
  • 51
  • 98
John
  • 723
  • 1
  • 10
  • 12
-3
SELECT DISTINCT(firstName), ID, LastName from tableName GROUP BY firstName

Would be the best bet IMO

Alive to die - Anant
  • 70,531
  • 10
  • 51
  • 98
Monty
  • 1,304
  • 4
  • 19
  • 37
-5
SELECT DISTINCT (column1), column2
FROM table1
GROUP BY column1
Alive to die - Anant
  • 70,531
  • 10
  • 51
  • 98
mack
  • 23
  • 1
  • 1
    `DISTINCT()` is not a function. Also DISTINCT and GROUP BY are doing the same thing, so no reason put them both. – Marki555 Jul 15 '15 at 07:37
  • This is not an efficient statement, you should use either DISTINCT or Group By not both. – heshanlk Aug 18 '17 at 03:33