What is the difference between GROUP BY and DISTINCT?

Question

I have the table with the following data

empid   empname deptid   address
--------------------------------
aa76    John     6       34567
aa75    rob      4       23456
aa71    smith    3       12345
aa74    dave     2       12345
a77     blake    2       12345
aa73    andrew   3       12345
aa90    sam      1       12345
aa72    will     6       34567
aa70    rahul    5       34567

I've used the following queries:

select deptid, EMPID ,EMPNAME ,ADDRESS
from mytable
group by 1,2,3,4

Which gives the result:

deptid  empid  empname address
------------------------------
1       aa90   sam      12345
2       aa74   dave     12345
2       aa77   blake    12345
3       aa71   smith    12345
3       aa73   andrew   12345
4       aa75   rob      23456
5       aa70   rahul    34567
6       aa76   John     34567
6       aa72   will     34567

And for the query:

select distinct (deptid),EMPID,EMPNAME,ADDRESS
from mytable

The result set is:

deptid empid empname address   
----------------------------
1      aa90  sam     12345
2      aa74  dave    12345
2      aa77  blake   12345
3      aa71  smith   12345
3      aa73  andrew  12345
4      aa75  rob     23456
5      aa70  rahul   34567
6      aa72  will    34567
6      aa76  John    34567

In the second query though I've given DISTINCT for DEPTID, how come I got the duplicate DEPTID...

Could you explain this?

I guess you need to first explain what result you are expecting ? DISTINCT works on all columns. so your second query gave you 2 unique rows of data even though deptid is same. Use GROUP BY for aggregate functions like count or sum. — Rahul, Nov 12 '09 at 05:40
Referring to a column by position in the SELECT clause is called an ordinal. IE: `group by 1,2,3,4` - it's not recommended in case the SELECT columns change. — OMG Ponies, Nov 12 '09 at 05:43
Possible duplicate: http://stackoverflow.com/questions/426723/sql-group-by-versus-distinct — OMG Ponies, Nov 12 '09 at 05:44
As a side-note; on Teradata 13.10 (and older versions), using DISTINCT can cause a sort operation, which is slower than GROUP BY. So from a performance perspective, one should prefer GROUP BY over DISTINCT wherever possible. I think this is fixed in the 14.10 version. — xenodevil, Mar 26 '14 at 13:23

score 10 · Answer 1 · edited Sep 05 '12 at 05:17

10

DISTINCT eliminates repeating rows. GROUP BY groups unique records, and allows you to perform aggregate functions.

edited Sep 05 '12 at 05:17

Himanshu

31,810
31
111
133

answered Nov 12 '09 at 05:38

Robert Harvey

178,213
47
333
501

score 6 · Accepted Answer · edited Sep 05 '12 at 05:17

6

DISTINCT refer to distinct records as a whole, not distinct fields in the record.

edited Sep 05 '12 at 05:17

Himanshu

31,810
31
111
133

answered Nov 12 '09 at 05:37

Anton

4,554
2
37
60

This is were i was confused.. I was expecting distinct to return distinct column... Its cleared for me now.. – SrinR Nov 12 '09 at 08:55

score 2 · Answer 3 · edited Mar 13 '13 at 10:24

2

DISTINCT works only on the entire row. Don't be mislead into thinking SELECT DISTINCT(A), B does something different. This is equivalent to SELECT DISTINCT A, B

edited Mar 13 '13 at 10:24

Taryn

242,637
56
362
405

answered Mar 13 '13 at 10:03

Shakthi

21
1

lins314159 · Answer 4 · 2009-11-13T01:57:31.247

1

While group by all columns and distinct will give you the same results in Teradata, they have different algorithms behind the scenes and you will generally get better performance from using group by than from using distinct. I believe there were plans to have both implemented the same way, but they are still different in the version I'm using (v2r6) and I haven't tried on Teradata 12 yet.

edited Nov 13 '09 at 01:57

answered Nov 12 '09 at 23:36

lins314159

2,510
1
16
19

1

See the explanation between how GROUP BY vs DISTINCT is handled by the optimizer here: http://forums.teradata.com/forum/enterprise/distinct-vs-group-by-insert-vs-create-as It appears that in Teradata 13 that is the point where the optimizer improves the performance of DISTINCT as indicated in the link above. – Rob Paller Mar 19 '10 at 03:42

score 0 · Answer 5 · answered Mar 13 '13 at 16:14

0

Group By and Distinct both will work Same. Comparing to Distinct Group By Gives good performance because it processes less rows and occupies less spool memory

answered Mar 13 '13 at 16:14

PhaniKumar

1
1

score -1 · Answer 6 · answered Feb 10 '10 at 09:02

-1

Distinct will not work fine with multi column. though given distinct on single column but it gives the unique combination of specified columns.

So, Group by gives the unique records and can do aggregates too.

answered Feb 10 '10 at 09:02

limba

1

score -1 · Answer 7 · edited Mar 15 '14 at 16:30

I don't know how to explain the difference but I give you the examples _with_queries_ through this you can better understand the difference between GROUP BY and DISTINCT.

Question: How many people are in each unique state in the customers table

select distinct(state), count(*) from customers;

RESULT

Washington  17
----------------------------------------------------------

select State, count(*) from customers GROUP BY STATE;

RESULT

**Arizona    6
Colorado         2
Hawaii           1
Idaho            1
North Carolina   1
Oregon           2
Sourth Carolina  1
Washington   2
Wisconsin    1**

Just make your own table and check the result

This is definitely not the result in Teradata (and shouldn't be in other DBMSes), it will result in a syntax error. Which DBMS did you use for it? MySQL? — dnoeth, Mar 15 '14 at 16:52

What is the difference between GROUP BY and DISTINCT?

7 Answers7