Is there any difference between GROUP BY and DISTINCT

Question

I learned something simple about SQL the other day:

SELECT c FROM myTbl GROUP BY C

Has the same result as:

SELECT DISTINCT C FROM myTbl

What I am curious of, is there anything different in the way an SQL engine processes the command, or are they truly the same thing?

I personally prefer the distinct syntax, but I am sure it's more out of habit than anything else.

EDIT: This is not a question about aggregates. The use of GROUP BY with aggregate functions is understood.

This is not a question about aggregates, it is a GROUP BY functioning the same as a distinct when no aggregate function is present — Brettski, Oct 02 '08 at 20:25
You can also do `SELECT c FROM myTbl UNION SELECT c FROM myTbl` and get the same result... But why complicate things when SELECT DISTINCT is so easy. — jarlh, Jul 05 '17 at 14:57
The 'logical order of execution' of `GROUP BY` is far earlier than 'SELECT' and `DISTINCT` follows select. — Paul Maxwell, Oct 20 '17 at 05:38
One very minor difference that I haven't seen mentioned is that `DISTINCT` results in actually selecting the field - i.e. the value will appear in the result set. `GROUP BY` can effectively remove duplicates without actually selecting the field. This is somewhat irrelevant in most cases, but could be exactly what you want in others. If you end up using `GROUP BY` in place of `DISTINCT`, an explanatory comment in the code is probably warranted. — rinogo, May 01 '18 at 18:47
The bottom line seems to be that because duplicate removal occurs at different points in the execution plan, one can be more efficient than the other because dup removal requires a sort or perhaps use of this index over that index. Thus there may be an advantage from early dup removal or the advantage may come from use of a different index early on and eating a sort later when there are few rows left and sorting is negligible. — bielawski, Dec 20 '18 at 13:02
On dba the question [mysql-using-distinct-and-group-by-together](https://dba.stackexchange.com/questions/262408/) contains usefull replies as well. — surfmuggle, Jul 06 '21 at 10:22

score 325 · Accepted Answer · edited Feb 14 '19 at 23:55

325

MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."

However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.

A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?

(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct and screw => get list of unique values in a table column)

edited Feb 14 '19 at 23:55

brett rogers

6,501
7
33
43

answered Oct 02 '08 at 20:52

Skeolan

4,328
2
19
20

I am in complete agreement with you Skeolan. I was quite surprised when I came across this functionality. It isn't something I plan to use, but a way things have been done at this new place I am working at. – Brettski Oct 02 '08 at 21:15
At least in Oracle 12 there do appear to be cases where DISTINCT, getting distinct values by UNION, and GROUP BY work differently. I just had a case earlier today where DISTINCT and distinct by UNION cause an oracle error, but GROUP BY worked; I was selecting only 1 column from a view and not using any aggregation; I'm still baffled why it required it, but it does confirm there is some difference in the execution. As others point out, it also lets you GROUP BY columns not in the select, though that should rarely be necessary without aggregation. – ZeroK Sep 17 '15 at 20:22
1

When it comes to SQL you always have both a screwdriver and hammer available. Why use a hammer to drive in a screw? – jarlh Jul 05 '17 at 15:02
Just to be clear with regard to you analogy - is your hammer == GroupBy and screwdriver == Distinct in this case ? – HopeKing Feb 12 '18 at 09:10
Wow, this ten-year-old question still has legs! "Distinct" is the screwdriver, if "list of unique values" is the screw. I'll update the answer to make the analogy clearer. – Skeolan Feb 14 '18 at 20:01
In Amazon Redshift Spectrum case, it's better to use `GROUP BY` because it's pushed down to Spectrum layer and Redshift only displays the result data. If `DISTINCT` used for the same query on Spectrum data, Spectrum will bring all scanned data to Redshift and Redshift leader node will execute DISTINCT, since it's a leader node function – demircioglu Aug 05 '19 at 23:44
2

Of all the definitions of "screw" that I've ever used, **"get list of unique values in a table column"** had not been one of them. – dotancohen Aug 09 '21 at 06:47

score 182 · Answer 2 · edited May 23 '19 at 00:06

182

GROUP BY lets you use aggregate functions, like AVG, MAX, MIN, SUM, and COUNT. On the other hand DISTINCT just removes duplicates.

For example, if you have a bunch of purchase records, and you want to know how much was spent by each department, you might do something like:

SELECT department, SUM(amount) FROM purchases GROUP BY department

This will give you one row per department, containing the department name and the sum of all of the amount values in all rows for that department.

edited May 23 '19 at 00:06

eeqk

3,492
1
15
22

answered Oct 02 '08 at 20:10

Andru Luvisi

24,367
6
53
66

5

The use of GROUP BY I understand, The question is based on the fact that it returns a distinct dataset when no aggregate function is present. – Brettski Oct 02 '08 at 20:27
3

Because GROUP BY implicitly does a DISTINCT over the values of the column you're grouping by (sorry for the cacophony). – Joe Pineda Oct 02 '08 at 21:37
Is it not possible to use `DISTINCT` + a aggregate functions ? like this: `select distinct department, SUM(amount) from ...` – Shafizadeh Aug 24 '15 at 18:08
1

@Sajad, You can do that yes, but you still have to have the GROUP BY, so the DISTINCT doesn't do anything for you. – ZeroK Sep 17 '15 at 20:29

score 92 · Answer 3 · answered Aug 23 '17 at 07:43

What's the difference from a mere duplicate removal functionality point of view

Apart from the fact that unlike DISTINCT, GROUP BY allows for aggregating data per group (which has been mentioned by many other answers), the most important difference in my opinion is the fact that the two operations "happen" at two very different steps in the logical order of operations that are executed in a SELECT statement.

Here are the most important operations:

FROM (including JOIN, APPLY, etc.)
WHERE
GROUP BY (can remove duplicates)
Aggregations
HAVING
Window functions
SELECT
DISTINCT (can remove duplicates)
UNION, INTERSECT, EXCEPT (can remove duplicates)
ORDER BY
OFFSET
LIMIT

As you can see, the logical order of each operation influences what can be done with it and how it influences subsequent operations. In particular, the fact that the GROUP BY operation "happens before" the SELECT operation (the projection) means that:

It doesn't depend on the projection (which can be an advantage)
It cannot use any values from the projection (which can be a disadvantage)

1. It doesn't depend on the projection

An example where not depending on the projection is useful is if you want to calculate window functions on distinct values:

SELECT rating, row_number() OVER (ORDER BY rating) AS rn
FROM film
GROUP BY rating

When run against the Sakila database, this yields:

rating   rn
-----------
G        1
NC-17    2
PG       3
PG-13    4
R        5

The same couldn't be achieved with DISTINCT easily:

SELECT DISTINCT rating, row_number() OVER (ORDER BY rating) AS rn
FROM film

That query is "wrong" and yields something like:

rating   rn
------------
G        1
G        2
G        3
...
G        178
NC-17    179
NC-17    180
...

This is not what we wanted. The DISTINCT operation "happens after" the projection, so we can no longer remove DISTINCT ratings because the window function was already calculated and projected. In order to use DISTINCT, we'd have to nest that part of the query:

SELECT rating, row_number() OVER (ORDER BY rating) AS rn
FROM (
  SELECT DISTINCT rating FROM film
) f

Side-note: In this particular case, we could also use DENSE_RANK()

SELECT DISTINCT rating, dense_rank() OVER (ORDER BY rating) AS rn
FROM film

2. It cannot use any values from the projection

One of SQL's drawbacks is its verbosity at times. For the same reason as what we've seen before (namely the logical order of operations), we cannot "easily" group by something we're projecting.

This is invalid SQL:

SELECT first_name || ' ' || last_name AS name
FROM customer
GROUP BY name

This is valid (repeating the expression)

SELECT first_name || ' ' || last_name AS name
FROM customer
GROUP BY first_name || ' ' || last_name

This is valid, too (nesting the expression)

SELECT name
FROM (
  SELECT first_name || ' ' || last_name AS name
  FROM customer
) c
GROUP BY name

I've written about this topic more in depth in a blog post

I was honestly surprised to see that the order of execution wasn't discussed immediately on this question. Thank you, very nicely explained too. On your point 2. some (one?) db's do allow use of select aliases throughout the query (the one I know of is Teradata, but it is an exception). — Paul Maxwell, Oct 20 '17 at 05:33
@Used_By_Already: Sure, some databases do that. Many databases allow the use of those aliases in only parts (e.g. not `WHERE` but perhaps `GROUP BY`). In any case, I think it's a bad idea and I suggest never using that feature for portability and maintenance reasons. "Suddenly" it won't work anymore, e.g. when aliasing an aggregate function or window function. — Lukas Eder, Oct 20 '17 at 09:10
`never using that feature for portability and maintenance reasons` !! agreed 100% ... & I'm now enjoting your blog too, great work. Cheers. — Paul Maxwell, Oct 20 '17 at 09:34
This is a great high-quality answer that helped educate me on SQL order of operations. Very enlightening, thank you. — Jesse, May 03 '22 at 11:31
This is some great info @LukasEder, I want to add that for MySQL/MariaDB, using DISTINCT vs GROUPBY to remove duplicates (no aggregations) doesn't affect the execution plan despite any order of query operations. — TheCaffinatedDeveloper, Oct 23 '22 at 23:23

MusiGenesis · Answer 4 · 2013-04-05T00:28:33.550

52

There is no difference (in SQL Server, at least). Both queries use the same execution plan.

http://sqlmag.com/database-performance-tuning/distinct-vs-group

Maybe there is a difference, if there are sub-queries involved:

http://blog.sqlauthority.com/2007/03/29/sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by/

There is no difference (Oracle-style):

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:32961403234212

edited Apr 05 '13 at 00:28

answered Oct 02 '08 at 20:41

MusiGenesis

74,184
40
190
334

Updated Link for the first one: https://www.itprotoday.com/sql-server/distinct-vs-group – TheCaffinatedDeveloper Oct 23 '22 at 23:25

score 39 · Answer 5 · edited Sep 05 '12 at 05:09

39

Use DISTINCT if you just want to remove duplicates. Use GROUPY BY if you want to apply aggregate operators (MAX, SUM, GROUP_CONCAT, ..., or a HAVING clause).

edited Sep 05 '12 at 05:09

Himanshu

31,810
31
111
133

answered Oct 02 '08 at 20:11

jkramer

15,440
5
47
48

score 22 · Answer 6 · answered Oct 02 '08 at 20:51

I expect there is the possibility for subtle differences in their execution. I checked the execution plans for two functionally equivalent queries along these lines in Oracle 10g:

core> select sta from zip group by sta;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    58 |   174 |    44  (19)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    58 |   174 |    44  (19)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| ZIP  | 42303 |   123K|    38   (6)| 00:00:01 |
---------------------------------------------------------------------------

core> select distinct sta from zip;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    58 |   174 |    44  (19)| 00:00:01 |
|   1 |  HASH UNIQUE       |      |    58 |   174 |    44  (19)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| ZIP  | 42303 |   123K|    38   (6)| 00:00:01 |
---------------------------------------------------------------------------

The middle operation is slightly different: "HASH GROUP BY" vs. "HASH UNIQUE", but the estimated costs etc. are identical. I then executed these with tracing on and the actual operation counts were the same for both (except that the second one didn't have to do any physical reads due to caching).

But I think that because the operation names are different, the execution would follow somewhat different code paths and that opens the possibility of more significant differences.

I think you should prefer the DISTINCT syntax for this purpose. It's not just habit, it more clearly indicates the purpose of the query.

score 15 · Answer 7 · answered Oct 02 '08 at 20:11

15

For the query you posted, they are identical. But for other queries that may not be true.

For example, it's not the same as:

SELECT C FROM myTbl GROUP BY C, D

answered Oct 02 '08 at 20:11

Joel Coehoorn

399,467
113
570
794

score 14 · Answer 8 · answered May 17 '12 at 16:04

14

I read all the above comments but didn't see anyone pointed to the main difference between Group By and Distinct apart from the aggregation bit.

Distinct returns all the rows then de-duplicates them whereas Group By de-deduplicate the rows as they're read by the algorithm one by one.

This means they can produce different results!

For example, the below codes generate different results:

SELECT distinct ROW_NUMBER() OVER (ORDER BY Name), Name FROM NamesTable

 SELECT ROW_NUMBER() OVER (ORDER BY Name), Name FROM NamesTable
GROUP BY Name

If there are 10 names in the table where 1 of which is a duplicate of another then the first query returns 10 rows whereas the second query returns 9 rows.

The reason is what I said above so they can behave differently!

answered May 17 '12 at 16:04

The Light

26,341
62
176
258

12

That's because while you're only grouping by `Name` in the second query, the `distinct` keyword applies to both the columns `Name` and your `ROW_NUMBER()` column in the `select` clause of the first query. Had you also grouped by the first column in the second query, the queries would have returned the same results. – Jul 24 '15 at 06:09
1

This is an outcome of the `order of execution` of the SQL clauses which is (in a general sense) `FROM and ON (joins)` , `WHERE` , `GROUP BY` , `HAVING` , `SELECT` , `DISTINCT` , `ORDER BY` , `LIMIT / OFFSET / TOP` so the second query the names are reduced in number by group by and later the row_number() is applied resulting in one row per unique name. In the first query row_number() is applied before the distinct is applied, and due to the nature of the row_number() function every row gets a unique integer, thus every row is returned even if there are repeated name values. – Paul Maxwell Oct 20 '17 at 05:24

score 12 · Answer 9 · answered Oct 02 '08 at 20:12

12

If you use DISTINCT with multiple columns, the result set won't be grouped as it will with GROUP BY, and you can't use aggregate functions with DISTINCT.

answered Oct 02 '08 at 20:12

Bill the Lizard

398,270
210
566
880

score 7 · Answer 10 · answered Oct 02 '08 at 20:20

GROUP BY has a very specific meaning that is distinct (heh) from the DISTINCT function.

GROUP BY causes the query results to be grouped using the chosen expression, aggregate functions can then be applied, and these will act on each group, rather than the entire resultset.

Here's an example that might help:

Given a table that looks like this:

name
------
barry
dave
bill
dave
dave
barry
john

This query:

SELECT name, count(*) AS count FROM table GROUP BY name;

Will produce output like this:

name    count
-------------
barry   2
dave    3
bill    1
john    1

Which is obviously very different from using DISTINCT. If you want to group your results, use GROUP BY, if you just want a unique list of a specific column, use DISTINCT. This will give your database a chance to optimise the query for your needs.

score 6 · Answer 11 · edited Mar 25 '15 at 14:29

6

If you are using a GROUP BY without any aggregate function then internally it will treated as DISTINCT, so in this case there is no difference between GROUP BY and DISTINCT.

But when you are provided with DISTINCT clause better to use it for finding your unique records because the objective of GROUP BY is to achieve aggregation.

edited Mar 25 '15 at 14:29

Ben Dauphinee

4,061
8
40
59

answered Dec 28 '11 at 11:28

Vikram Mahapatra

101
1
2

score 5 · Answer 12 · answered Oct 02 '08 at 20:10

5

They have different semantics, even if they happen to have equivalent results on your particular data.

answered Oct 02 '08 at 20:10

Hank Gay

70,339
36
160
222

7

how is this an answer? Simply stating that it is a difference of semantics adds no information whatsoever. – Flame Nov 14 '20 at 13:50
Have to agree that this is not an answer as it adds literally nothing to the discourse – Cozzbie Apr 09 '22 at 03:51

score 4 · Answer 13 · answered Oct 02 '08 at 20:15

group by is used in aggregate operations -- like when you want to get a count of Bs broken down by column C

select C, count(B) from myTbl group by C

distinct is what it sounds like -- you get unique rows.

In sql server 2005, it looks like the query optimizer is able to optimize away the difference in the simplistic examples I ran. Dunno if you can count on that in all situations, though.

score 4 · Answer 14 · answered Oct 02 '08 at 20:57

4

Please don't use GROUP BY when you mean DISTINCT, even if they happen to work the same. I'm assuming you're trying to shave off milliseconds from queries, and I have to point out that developer time is orders of magnitude more expensive than computer time.

answered Oct 02 '08 at 20:57

Andy Lester

91,102
13
100
152

1

Usually, developer time is a fixed cost while computer time is incurred every time you run the query. So depending on your use case it's worth to optimize a query by your expensive developer. – Peter Schuetze May 12 '22 at 15:57

Ram Ghadiyaram · Answer 15 · 2018-06-23T03:24:07.080

In Teradata perspective :

From a result set point of view, it does not matter if you use DISTINCT or GROUP BY in Teradata. The answer set will be the same.

From a performance point of view, it is not the same.

To understand what impacts performance, you need to know what happens on Teradata when executing a statement with DISTINCT or GROUP BY.

In the case of DISTINCT, the rows are redistributed immediately without any preaggregation taking place, while in the case of GROUP BY, in a first step a preaggregation is done and only then are the unique values redistributed across the AMPs.

Don’t think now that GROUP BY is always better from a performance point of view. When you have many different values, the preaggregation step of GROUP BY is not very efficient. Teradata has to sort the data to remove duplicates. In this case, it may be better to the redistribution first, i.e. use the DISTINCT statement. Only if there are many duplicate values, the GROUP BY statement is probably the better choice as only once the deduplication step takes place, after redistribution.

In short, DISTINCT vs. GROUP BY in Teradata means:

GROUP BY -> for many duplicates DISTINCT -> no or a few duplicates only . At times, when using DISTINCT, you run out of spool space on an AMP. The reason is that redistribution takes place immediately, and skewing could cause AMPs to run out of space.

If this happens, you have probably a better chance with GROUP BY, as duplicates are already removed in a first step, and less data is moved across the AMPs.

Teradata is a Relational Database Management System (RDBMS), capable of supporting many concurrent users from various client platforms. Teradata is compatible with the ANSI standard and built completely on parallel architecture. — Ram Ghadiyaram, Jun 19 '18 at 21:35

score 3 · Answer 16 · answered Oct 02 '08 at 20:12

3

In that particular query there is no difference. But, of course, if you add any aggregate columns then you'll have to use group by.

answered Oct 02 '08 at 20:12

Jeffrey L Whitledge

58,241
9
71
99

score 2 · Answer 17 · answered Oct 02 '08 at 20:16

2

You're only noticing that because you are selecting a single column.

Try selecting two fields and see what happens.

Group By is intended to be used like this:

SELECT name, SUM(transaction) FROM myTbl GROUP BY name

Which would show the sum of all transactions for each person.

answered Oct 02 '08 at 20:16

Chris Cudmore

29,793
12
57
94

This is not a question of aggregates. In your example, SELECT c, d FROM mytbl GROUP BY C, D; will in fact return the same data set as SELECT DISTINCT C, D FROM mytbl; This is the fundamentals of the question – Brettski Oct 02 '08 at 20:33

score 2 · Answer 18 · answered Oct 03 '08 at 10:09

From a 'SQL the language' perspective the two constructs are equivalent and which one you choose is one of those 'lifestyle' choices we all have to make. I think there is a good case for DISTINCT being more explicit (and therefore is more considerate to the person who will inherit your code etc) but that doesn't mean the GROUP BY construct is an invalid choice.

I think this 'GROUP BY is for aggregates' is the wrong emphasis. Folk should be aware that the set function (MAX, MIN, COUNT, etc) can be omitted so that they can understand the coder's intent when it is.

The ideal optimizer will recognize equivalent SQL constructs and will always pick the ideal plan accordingly. For your real life SQL engine of choice, you must test :)

PS note the position of the DISTINCT keyword in the select clause may produce different results e.g. contrast:

SELECT COUNT(DISTINCT C) FROM myTbl;

SELECT DISTINCT COUNT(C) FROM myTbl;

score 2 · Answer 19 · answered Jan 29 '16 at 16:06

I know it's an old post. But it happens that I had a query that used group by just to return distinct values when using that query in toad and oracle reports everything worked fine, I mean a good response time. When we migrated from Oracle 9i to 11g the response time in Toad was excellent but in the reporte it took about 35 minutes to finish the report when using previous version it took about 5 minutes.

The solution was to change the group by and use DISTINCT and now the report runs in about 30 secs.

I hope this is useful for someone with the same situation.

score 2 · Answer 20 · edited Jul 02 '20 at 19:44

2

In Hive (HQL), GROUP BY can be way faster than DISTINCT, because the former does not require comparing all fields in the table.

See: https://sqlperformance.com/2017/01/t-sql-queries/surprises-assumptions-group-by-distinct.

edited Jul 02 '20 at 19:44

David Refoua

3,476
3
31
55

answered Jul 01 '18 at 19:08

John Jiang

282
2
13

SkyRar · Answer 21 · 2019-07-17T23:39:08.580

Sometimes they may give you the same results but they are meant to be used in different sense/case. The main difference is in syntax.

Minutely notice the example below. DISTINCT is used to filter out the duplicate set of values. (6, cs, 9.1) and (1, cs, 5.5) are two different sets. So DISTINCT is going to display both the rows while GROUP BY Branch is going to display only one set.

 SELECT * FROM student; 
+------+--------+------+
| Id   | Branch | CGPA |
+------+--------+------+
|    3 | civil  |  7.2 |
|    2 | mech   |  6.3 |
|    6 | cs     |  9.1 |
|    4 | eee    |  8.2 |
|    1 | cs     |  5.5 |
+------+--------+------+
5 rows in set (0.001 sec)

SELECT DISTINCT * FROM student; 
+------+--------+------+
| Id   | Branch | CGPA |
+------+--------+------+
|    3 | civil  |  7.2 |
|    2 | mech   |  6.3 |
|    6 | cs     |  9.1 |
|    4 | eee    |  8.2 |
|    1 | cs     |  5.5 |
+------+--------+------+
5 rows in set (0.001 sec)

SELECT * FROM student GROUP BY Branch;
+------+--------+------+
| Id   | Branch | CGPA |
+------+--------+------+
|    3 | civil  |  7.2 |
|    6 | cs     |  9.1 |
|    4 | eee    |  8.2 |
|    2 | mech   |  6.3 |
+------+--------+------+
4 rows in set (0.001 sec)

Sometimes the results that can be achieved by GROUP BY clause is not possible to achieved by DISTINCT without using some extra clause or conditions. E.g in above case.

To get the same result as DISTINCT you have to pass all the column names in GROUP BY clause like below. So see the syntactical difference. You must have knowledge about all the column names to use GROUP BY clause in that case.

SELECT * FROM student GROUP BY Id, Branch, CGPA;
+------+--------+------+
| Id   | Branch | CGPA |
+------+--------+------+
|    1 | cs     |  5.5 |
|    2 | mech   |  6.3 |
|    3 | civil  |  7.2 |
|    4 | eee    |  8.2 |
|    6 | cs     |  9.1 |
+------+--------+------+

Also I have noticed GROUP BY displays the results in ascending order by default which DISTINCT does not. But I am not sure about this. It may be differ vendor wise.

Source : https://dbjpanda.me/dbms/languages/sql/sql-syntax-with-examples#group-by

score 2 · Answer 22 · answered Sep 27 '19 at 09:47

In terms of usage, GROUP BY is used for grouping those rows you want to calculate. DISTINCT will not do any calculation. It will show no duplicate rows.

I always used DISTINCT if I want to present data without duplicates.

If I want to do calculations like summing up the total quantity of mangoes, I will use GROUP BY

score 0 · Answer 23 · answered Oct 02 '08 at 21:05

0

The way I always understood it is that using distinct is the same as grouping by every field you selected in the order you selected them.

i.e:

select distinct a, b, c from table;

is the same as:

select a, b, c from table group by a, b, c

answered Oct 02 '08 at 21:05

Zenshai

10,307
2
19
18

Agreed, but would it be same as select c,b,a from table group by a,b,c – Dheer Oct 03 '08 at 16:37
Yes, it would be the same – Caius Jard Oct 21 '18 at 03:30

score 0 · Answer 24 · answered Jan 09 '18 at 04:40

0

Funtional efficiency is totally different. If you would like to select only "return value" except duplicate one, use distinct is better than group by. Because "group by" include ( sorting + removing ) , "distinct" include ( removing )

answered Jan 09 '18 at 04:40

Jun

11
2

SELECT TOP 1 would work even better for a single return value. Query execution may also be a factor if you want results from indexed fields. – CZahrobsky Apr 05 '23 at 16:05

score 0 · Answer 25 · answered Aug 30 '19 at 15:15

Generally we can use DISTINCT for eliminate the duplicates on Specific Column in the table.

In Case of 'GROUP BY' we can Apply the Aggregation Functions like AVG, MAX, MIN, SUM, and COUNT on Specific column and fetch the column name and it aggregation function result on the same column.

Example :

select  specialColumn,sum(specialColumn) from yourTableName group by specialColumn;

score -1 · Answer 26 · edited Apr 03 '22 at 05:33

-1

There is no significantly difference between group by and distinct clause except the usage of aggregate functions. Both can be used to distinguish the values but if in performance point of view group by is better. When distinct keyword is used , internally it used sort operation which can be view in execution plan.

Try simple example

Declare @tmpresult table
(
  Id tinyint
)

Insert into @tmpresult
Select 5
Union all
Select 2
Union all
Select 3
Union all
Select 4


Select distinct 
Id
From @tmpresult

edited Apr 03 '22 at 05:33

RF1991

2,037
4
8
17

answered Feb 10 '15 at 16:56

Vinod Narwal

169
2

distinct and group by both will – vignesh Dec 30 '16 at 08:20

Is there any difference between GROUP BY and DISTINCT

26 Answers26

What's the difference from a mere duplicate removal functionality point of view

1. It doesn't depend on the projection

2. It cannot use any values from the projection

Linked

Related