1

I have a list of network logs. There is a time stamp for each row and I need to pull the distinct timestamps such that I can use these for later queries. At the moment I am using the following query:

select distinct time from metastatus

Now I firstly want to confirm that the select distinct query returns sorted values?

Secondly I have already created an index on time which the database is using for the above query.

Lastly I did think about retrieving all the times and then finding the unique values through my code however there are too many value for me to retrieve all the times (160,000,000).

Thanks, James

James Elder
  • 1,583
  • 3
  • 22
  • 34
  • I am currently running a query to cluster the database by time index i.e. `cluster metastatus using timeindex` – James Elder Dec 04 '12 at 11:37
  • 1
    Select distinct doesn't guarantee sorted values, you need an order by on the end. The optimizer may decide to sort the values to produce the distinct list, but it's not the only approach it has. – Laurence Dec 04 '12 at 11:44
  • Thank you, I had the `order by` but I removed it as I saw it returned ordered values and assumed it did this every time. I'll put it back. – James Elder Dec 04 '12 at 11:48

1 Answers1

1

The usual method to get DISTINCT values used by PostgreSQL is to sort the rows, then pick the first of each set. That's why you normally get a sorted result, but Postgres asserts nothing as to the order. Just add an ORDER BY that agrees with the DISTINCT clause. It's better from a documentation point of view in any case. Doesn't usually cost anything extra. Check with EXPLAIN ANALYZE to confirm.

The idea to retrieve all rows and sort outside of Postgres is nonsense. There is no chance this will even get close to the performance of the RBDMS which is optimized for just such a task. You would also transfer quite a bit of noise over the wire. Consider the answers under this related question.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I understand that the performance of an SQL query is likely to be better than any code I write to sort rows. I am not doing it through my code anyway because there are too many rows and as you said it would likely be slower. I was just going to try both methods for a comparison of the time that's the only reason I mentioned it, but it's not possible. – James Elder Dec 04 '12 at 12:02
  • This is bad advice. An `order by` on a large table can have even worse performance than the `select distinct` query. – Cerin Jan 27 '14 at 14:48