154

When I perform SELECT * FROM table I got results like below:

1 item1 data1
2 item1 data2
3 item2 data3
4 item3 data4

As you can see, there are dup records from column2 (item1 are dupped). So how could I just get result like this:

1 item1 data1
2 item2 data3
3 item3 data4

Only one record are returned from the duplicate, along with the rest of the unique records.

JoSSte
  • 2,953
  • 6
  • 34
  • 54
Yinan
  • 2,516
  • 4
  • 22
  • 23
  • 3
    Item 1 isn't technically duplicated. As shown, rows 1 and 2 are unique observations. What if you wanted to keep row 2 and not row 1? – Cybernetic Sep 14 '20 at 20:16
  • If you are using Postgres, there is a specific operator for this. See [the accepted answer to this question](https://stackoverflow.com/a/16918028/558006) – brotskydotcom Jun 14 '22 at 21:20

10 Answers10

174

With the distinct keyword with single and multiple column names, you get distinct records:

SELECT DISTINCT column 1, column 2, ...
FROM table_name;
Arsen Khachaturyan
  • 7,904
  • 4
  • 42
  • 42
mjallday
  • 9,796
  • 9
  • 51
  • 71
  • 36
    Can it be that the answer is actually wrong? DISTINCT is applied to all selected columns (at least on a DB2), which still will return duplicate values in individual columns. – Konstantin Jan 04 '18 at 10:46
  • 6
    Yes, `DISTINCT` will not give the desired result (atleast on Postgres) , as its applied to all the select columns and not just to `column1` as specified in this answer. – Binita Bharati Sep 09 '21 at 11:37
  • 4
    This does not work in Bigquery – Spandyie Dec 10 '21 at 17:46
90

There are 4 methods you can use:

  1. DISTINCT
  2. GROUP BY
  3. Subquery
  4. Common Table Expression (CTE) with ROW_NUMBER()

Consider the following sample TABLE with test data:

/** Create test table */
CREATE TEMPORARY TABLE dupes(word text, num int, id int);

/** Add test data with duplicates */
INSERT INTO dupes(word, num, id)
VALUES ('aaa', 100, 1)
      ,('bbb', 200, 2)
      ,('ccc', 300, 3)
      ,('bbb', 400, 4)
      ,('bbb', 200, 5)     -- duplicate
      ,('ccc', 300, 6)     -- duplicate
      ,('ddd', 400, 7)
      ,('bbb', 400, 8)     -- duplicate
      ,('aaa', 100, 9)     -- duplicate
      ,('ccc', 300, 10);   -- duplicate

Option 1: SELECT DISTINCT

This is the most simple and straight forward, but also the most limited way:

SELECT DISTINCT word, num 
FROM    dupes
ORDER BY word, num;

/*
word|num|
----|---|
aaa |100|
bbb |200|
bbb |400|
ccc |300|
ddd |400|
*/

Option 2: GROUP BY

Grouping allows you to add aggregated data, like the min(id), max(id), count(*), etc:

SELECT  word, num, min(id), max(id), count(*)
FROM    dupes
GROUP BY word, num
ORDER BY word, num;

/*
word|num|min|max|count|
----|---|---|---|-----|
aaa |100|  1|  9|    2|
bbb |200|  2|  5|    2|
bbb |400|  4|  8|    2|
ccc |300|  3| 10|    3|
ddd |400|  7|  7|    1|
*/

Option 3: Subquery

Using a subquery, you can first identify the duplicate rows to ignore, and then filter them out in the outer query with the WHERE NOT IN (subquery) construct:

/** Find the higher id values of duplicates, distinct only added for clarity */
    SELECT  distinct d2.id
    FROM    dupes d1
        INNER JOIN dupes d2 ON d2.word=d1.word AND d2.num=d1.num
    WHERE d2.id > d1.id

/*
id|
--|
 5|
 6|
 8|
 9|
10|
*/

/** Use the previous query in a subquery to exclude the dupliates with higher id values */
SELECT  *
FROM    dupes
WHERE   id NOT IN (
    SELECT  d2.id
    FROM    dupes d1
        INNER JOIN dupes d2 ON d2.word=d1.word AND d2.num=d1.num
    WHERE d2.id > d1.id
)
ORDER BY word, num;

/*
word|num|id|
----|---|--|
aaa |100| 1|
bbb |200| 2|
bbb |400| 4|
ccc |300| 3|
ddd |400| 7|
*/

Option 4: Common Table Expression with ROW_NUMBER()

In the Common Table Expression (CTE), select the ROW_NUMBER(), partitioned by the group column and ordered in the desired order. Then SELECT only the records that have ROW_NUMBER() = 1:

WITH CTE AS (
    SELECT  *
           ,row_number() OVER(PARTITION BY word, num ORDER BY id) AS row_num
    FROM    dupes
)
SELECT  word, num, id 
FROM    cte
WHERE   row_num = 1
ORDER BY word, num;

/*
word|num|id|
----|---|--|
aaa |100| 1|
bbb |200| 2|
bbb |400| 4|
ccc |300| 3|
ddd |400| 7|
*/
isapir
  • 21,295
  • 13
  • 115
  • 116
36

If you only need to remove duplicates then use DISTINCT. GROUP BY should be used to apply aggregate operators to each group

GROUP BY v DISTINCT

rahul
  • 184,426
  • 49
  • 232
  • 263
14

It depends on which rown you want to return for each unique item. Your data seems to indicate the minimum data value so in this instance for SQL Server.

SELECT item, min(data)
FROM  table
GROUP BY item
Dave Barker
  • 6,303
  • 2
  • 24
  • 25
10

I am not sure if the accepted answer works. It does not work on postgres 12 at least. DISTINCT keyword is supposed to be applied to all the columns in the select query and not just to the column next to which DISTINCT keyword is written. So, basically, it means that every row returned in the result will be unique in terms of the combination of the select query columns. In OP's question, the below two result rows are already distinct, as they have different values for column1 and column 3.

1 item1 data1
2 item1 data2

Now, to answer the question, atleast in postgres, there is a DISTINCT ON keyword. This will achieve what the OP requires.

select DISTINCT ON(column2) column1, column3 from Table1;

Binita Bharati
  • 5,239
  • 1
  • 43
  • 24
  • 2
    This is a super handy feature of Postgres, thanks for pointing it out in the context of this question! It's also captured as the [accepted answer of this other question](https://stackoverflow.com/a/16918028/558006) – brotskydotcom Jun 14 '22 at 21:18
6

just use inner join because group by won't work with multiple columns saying not contained in either an aggregate function.

SELECT a.*
FROM yourtable a
INNER JOIN 
  (SELECT yourcolumn,
    MIN(id) as id
  FROM yourtable 
  GROUP BY yourcolumn
) AS b
  ON a.yourcolumn= b.yourcolumn
  AND a.id = b.id;
Ankit Kashyap
  • 79
  • 1
  • 11
  • That is the answer to a different question, probably one that should be tagged with [tag:greatest-n-per-group] –  Jan 03 '18 at 08:28
  • This and the solution of Dave Baker are the correct solutions for the SO question. The advantage of this solution is that it allows to select rows with only some specified distinct columns and one column MIN(id) AS id has to be defined to select only one of the multiple specified columns. – giordano Mar 01 '19 at 15:00
4

To get all the columns in your result you need to place something as:

SELECT distinct a, Table.* FROM Table

it will place a as the first column and the rest will be ALL of the columns in the same order as your definition. This is, column a will be repeated.

htafoya
  • 18,261
  • 11
  • 80
  • 104
  • 1
    Are you sure about this? I tried this on w3schools and it returned the same as SELECT *, except a was the first column – Freakishly Jul 01 '15 at 21:31
  • @Freakishly yes and that is exactly what says it will do in my answer :/ – htafoya Jul 17 '15 at 00:36
  • This won't work, you can't select * after the distinct like that (you will get a 1064 error - Error in your SQL syntax) – tim.baker May 31 '17 at 08:47
  • @Mohsinkhan well I forgot to place that you need to write the table name. Somehow when I wrote this it worked but I just tested now and it didn't without the table name before the * – htafoya Nov 15 '17 at 07:41
  • 2
    This is exactly the same as `select distinct * from ...` –  Nov 15 '17 at 08:03
1

I find that if I can't use DISTINCT for any reason, then GROUP BY will work.

John Hamelink
  • 1,036
  • 2
  • 15
  • 33
0

I think person, who want to operate query, which want to write query like asterix(*) instead of column(s). if so than this question helps to him answer.

how to select rows based on distinct values of A COLUMN only

-4
SELECT 
  Eff_st 
FROM 
  (
    SELECT 
      EFF_ST, 
      ROW_NUMBER() over(PARTITION BY eff_st) XYZ - 
    FROM 
      ABC.CODE_DIM
  ) 
WHERE 
  XYZ = 1 
ORDER BY 
  EFF_STFETCH FIRST 5 row only

RF1991
  • 2,037
  • 4
  • 8
  • 17
  • Welcome to Stack Overflow. Code is a lot more helpful when it is accompanied by an explanation. Stack Overflow is about learning, not providing snippets to blindly copy and paste. This is particularly important when answering old questions (this one is nearly _13 years old_). Please [edit] your answer and explain how it answers the specific question being asked, and how it improves upon what is already here. See [answer]. – ChrisGPT was on strike Sep 18 '22 at 17:57
  • DONT COPY AND PAST CODE. – Mohammad Dec 20 '22 at 13:05
  • I found this solution very helpful – Sergej Loos Feb 22 '23 at 15:03