-1

In Oracle, which of the following three queries is most efficient:

SELECT DISTINCT a, b
FROM tab  

SELECT a, b
FROM tab
GROUP BY a, b

SELECT a, b
FROM
(SELECT a, b, row_number() OVER (PARTITION BY a, b ORDER BY a, b) rn
 FROM tab )
WHERE rn = 1
sinelaw
  • 16,205
  • 3
  • 49
  • 80
Isaac Kleinman
  • 3,994
  • 3
  • 31
  • 35
  • 3
    Can you take execution plans and see the cost of each? – mservidio Apr 27 '12 at 20:25
  • 4
    It depends. If one were always better than the others, then the others would not exist. I will say, I only use `GROUP BY` if I'm performing some aggregate function (`SUM`,`MIN`,`MAX`), and I avoid `DISTINCT` in all but the simplest cases, because more often than not, it's the developer saying, "I'm getting duplicate rows back, and I don't understand the data model, so I'll slap a `DISTINCT` in here," and then down the road, some combination of data invalidates original implicit assumptions, and reports "break". – Tebbe Apr 27 '12 at 20:29
  • 4
    P.S. In your analytic function, since you're already `PARTITION`ing `BY` `a` and `b`, you'll never also have the opportunity to `ORDER BY` `a` and/or `b`: your `ORDER BY` clause is essentially a no-op. So either `ORDER BY` another (non-`a`, non-`b`) column, or, if you don't care, `ORDER BY NULL`. – Tebbe Apr 27 '12 at 20:32
  • I seriously doubt that the `ROW_NUMBER()` version is better than the other two. The query optimizer _may_ be able to convert it to something similar to the others, but I sort of doubt it. And yeah, you should be able to remove the `ORDER BY` in there. – Clockwork-Muse Apr 27 '12 at 21:44
  • It depends... If you were going to join this query onto another then I'd always use `row_number()` over distinct. You first two questions are [answered here](http://stackoverflow.com/questions/164319/is-there-any-difference-between-group-by-and-distinct), including in Oracle, where someone has looked at the execution plan. – Ben Apr 28 '12 at 08:42
  • @X-Zero : Oracle insists on `ORDER BY` clause, but Tebbe's suggestion to `ORDER BY NULL` does seem to make sense. @BobJarvis : when it comes to a query doesn't efficient = speed? – Isaac Kleinman Apr 30 '12 at 15:00
  • @Ben : Can you please explain why you would use `row_number()` when joining the query with another one (indeed, that is my case)? – Isaac Kleinman Apr 30 '12 at 15:58

1 Answers1

1

The first one is the correct choice, because the others are quirky and non-standard (and slightly perverse) ways of achieving the same aim.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96