6

In a MySQL query, when using the DISTINCT option, does ORDER BY apply after the duplicates are removed? If not, is there any way to make it do so? I think it's causing some issues with my code.

EDIT:
Here's some more information about what's causing my problem. I understand that, at first glance, this order would not be important, since I am dealing with duplicate rows. However, this is not entirely the case, since I am using an INNER JOIN to sort the rows.

Say I have a table of forum threads, containing this data:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
+----+--------+-------------+

I also have a set of posts in another table like this:

+----+----------+--------+---------+
| id | threadid | userid | content |
+----+----------+--------+---------+
|  1 |        1 |      1 | Lorem   |
|  2 |        1 |      2 | Ipsum   |
|  3 |        2 |      2 | Test    |
|  4 |        3 |      1 | Foo     |
|  5 |        2 |      3 | Bar     |
|  6 |        3 |      5 | Bob     |
|  7 |        1 |      2 | Joe     |
+----+----------+--------+---------+

I am using the following MySQL query to get all threads, then sort them based on the latest post (assuming that posts with higher ids are more recent:

SELECT t.*
FROM Threads t
INNER JOIN Posts p ON t.id = p.threadid
ORDER BY p.id DESC

This works, and generates something like this:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  1 |      1 | Information |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
|  1 |      1 | Information |
+----+--------+-------------+

However, as you can see, the information is correct, but there are duplicate rows. I'd like to remove such duplicates, so I used SELECT DISTINCT instead. However, this yielded the following:

+----+--------+-------------+
| id | userid |    title    |
+----+--------+-------------+
|  3 |      2 | Support     |
|  2 |      1 | FAQ         |
|  1 |      1 | Information |
+----+--------+-------------+

This is obviously wrong, since the "Information" thread should be on top. It would seem that using DISTINCT causes the duplicates to be removed from the top to the bottom, so only the final rows are left. This causes some issues in the sorting.

Is this the case, or am I analyzing things incorrectly?

Alexis King
  • 43,109
  • 15
  • 131
  • 205
  • 2
    What issue do you think it's causing? What difference would it make? – eggyal Jun 05 '12 at 21:10
  • 1
    why would it matter? before or after applying distinct, the order should be the same – Rodolfo Jun 05 '12 at 21:12
  • 1
    can you show us a sample query of what you are trying and the actual problem you are running into? – Ryan Erickson Jun 05 '12 at 21:13
  • I believe the OP is referring to which row values are returned. For example, are the results sorted *after* `DISTINCT` sorted (and therefore an unsorted value within the `DISTINCT` could be presented) or before? – 0b10011 Jun 05 '12 at 21:17
  • 1
    @bfrohs - Doesn't make any sense to me. You would get the same results if you sort the rows first then remove the duplicates as opposed to removing the duplicates first and then sorting what remains. – Martin Smith Jun 05 '12 at 21:19
  • For example, if a row is grouped by `user`. User #1 has two `name` values: 'c' and 'a'. User #2 has one `name` value: 'b'. Sorted by `id` first (asc), the return would be (1:a;2:b;). However, if DISTINCT is run first, the return would be (1:c;2:b;). – 0b10011 Jun 05 '12 at 21:20
  • @MartinSmith, see http://stackoverflow.com/questions/10905026/does-order-by-apply-before-or-after-distinct#comment14218937_10905026 – 0b10011 Jun 05 '12 at 21:21
  • @bfrohs - Your link is back to this question! Is this some weird MySQL thing then? `ORDER BY` ought to specify the order of the results returned to the client. Does `DISTINCT` have some side effect in MySQL that would prevent that? – Martin Smith Jun 05 '12 at 21:22
  • 1
    @bfrohs, but with DISTINCT you'd get (1:a;1:c;2:b). – Marcus Adams Jun 05 '12 at 21:23
  • @MartinSmith, it's back to the comment on the question--just wanted to make sure you were looking at the right one. And yes, I'm fairly confident the second example (DISTINCT is run first) is true. – 0b10011 Jun 05 '12 at 21:23
  • @MarcusAdams, good call, I'm thinking of `GROUP BY` mixed with `DISTINCT`. – 0b10011 Jun 05 '12 at 21:24
  • @eggyal Updated the question with a lot more information. Please see the context. – Alexis King Jun 05 '12 at 22:19

3 Answers3

5

Two things to understand:

  1. Generally speaking, resultsets are unordered unless you specify an ORDER BY clause; to the extent that you specify a non-strict order (i.e. ORDER BY over non-unique columns), the order in which records that are equal under that ordering appear within the resultset is undefined.

    I suspect you may be specifying such a non-strict order, which is the root of your problems: ensure that your ordering is strict by specifying ORDER BY over a set of columns that is sufficient to uniquely identify each record for which you care about its final position in the resultset.

  2. DISTINCT may use GROUP BY, which causes the results to be ordered by the grouped columns; that is, SELECT DISTINCT a, b, c FROM t will produce a resultset that appears as though ORDER BY a, b, c has been applied. Again, specifying a sufficiently strict order to meet your needs will override this effect.


Following your update, bearing in mind my point #2 above, it is clear that the effect of grouping the results to achieve DISTINCT makes it impossible to then order by the non-grouped column p.id; instead, you want:

SELECT   t.*
FROM     Threads t INNER JOIN Posts p ON t.id = p.threadid
GROUP BY t.id
ORDER BY MAX(p.id) DESC
Community
  • 1
  • 1
eggyal
  • 122,705
  • 18
  • 212
  • 237
  • Awesome, thanks, that works. So, just to confirm, `MAX()` compares using the max value of `p.id` in each group? – Alexis King Jun 05 '12 at 23:13
1

DISTINCT informs MySQL how to build a rowset for you, ORDER BY gives a hint how this rowset should by presented. So the answer is: DISTINCT first, ORDER BY last.

ManieQ
  • 330
  • 2
  • 8
  • 3
    But, in reality, `DISTINCT` is implemented by sorting the results... so perhaps not if the optimiser uses the same ordering for both tasks. – eggyal Jun 05 '12 at 21:14
  • In this case, as eggyval points out, there is an [exception](http://forge.mysql.com/w/images/a/ae/HowMySQLHandlesOrderGroupDistinct.pdf). When DISTINCT is grouped with ORDER BY, it does the sorting (filesort) first. – Marcus Adams Jun 05 '12 at 21:49
1

The order in which DISTINCT and ORDER BY are applied, in most cases, will not affect the final output.

However, if you also use GROUP BY, this will affect the final output. In this case, the ORDER BY is performed after the GROUP BY, which will return unexpected results (assuming you expect the sort to be performed before the grouping).

0b10011
  • 18,397
  • 4
  • 65
  • 86
  • [`DISTINCT` may use `GROUP BY`](http://dev.mysql.com/doc/en/distinct-optimization.html). What would performing ordering *before* grouping accomplish that performing it afterwards doesn't (bearing in mind that selecting ungrouped columns without an aggregation function results in indeterminate results - not relevant in this case anyway as `DISTINCT` ensures no such columns exist)? – eggyal Jun 05 '12 at 21:30
  • @eggyal, the issue isn't with `DISTINCT`, but with `GROUP BY` and `ORDER BY`. If rows are grouped, but not selected, `DISTINCT` doesn't help anything, and the query could return the "wrong" row values (e.g. an `id`, that is later used to retrieve values). – 0b10011 Jun 05 '12 at 21:41