6

I'm trying to select only the IDs of a table that I'm querying on, and still be able to specify ordering on other columns.

First I tried simply doing:

SELECT DISTINCT countries.id
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC

That won't work, because for SELECT DISTINCT, ORDER BY expressions must appear in select list, and returns an error.

If I add province_infos.population and country_infos.population, it works, but I then get duplicate IDs, which I cannot have.

To resolve this, i attempted using DISTINCT ON():

SELECT DISTINCT ON (countries.id)
    countries.id, country_infos.population, province_infos.population
FROM countries
...
ORDER BY province_infos.population DESC, country_infos.population ASC

That then gives me the error SELECT DISTINCT ON expressions must match initial ORDER BY expressions. I can't SELECT DISTINCT ON a column without ordering it too.

It seems the only way for this to work, is to do something like:

SELECT DISTINCT ON (countries.id) 
    countries.id
FROM countries
...
ORDER BY countries.id DESC, province_infos.population DESC, country_infos.population ASC

I unfortunately can't do this, since I cannot order by IDs, as it skews the results of the other orders. And it seems the only way to not order by the IDs, is if I remove the DISTINCT from the select, but then I'll get duplicates.

Anyone know how I can work around this?

EDIT: The ... I omitted shouldn't be relevant, but in case you want to see:

JOIN country_infos ON country_infos.country_refer = countries.id
JOIN languages ON languages.country_refer = countries.id
JOIN provinces ON provinces.country_refer = countries.id
JOIN province_infos ON province_infos.province_refer = provinces.id
WHERE country_infos.population > 10.3
AND languages.alphabet = 'Latin'

And I'm not just trying to get this working for this specific query. This is just an example I'm using to explain the predicament. I'm generating these kinds of queries automatically off of an arbitrary data structure.

robbieperry22
  • 1,753
  • 1
  • 18
  • 49
  • Show us db schema, sample data, current and expected output. Please read [**How-to-Ask**](http://stackoverflow.com/help/how-to-ask) And here is a great place to [**START**](http://spaghettidba.com/2015/04/24/how-to-post-a-t-sql-question-on-a-public-forum/) to learn how improve your question quality and get better answers. [**How to create a Minimal, Complete, and Verifiable example**](http://stackoverflow.com/help/mcve) – Juan Carlos Oropeza Nov 30 '18 at 22:24
  • This is just an example I'm using to explain the problem of selecting a single distinct column, while ordering on other columns. – robbieperry22 Nov 30 '18 at 22:55
  • we still need some sample data and expected output – Juan Carlos Oropeza Nov 30 '18 at 23:35

2 Answers2

8

The general answer to your question is that when using DISTINCT ON (x, ...) in SELECT statement in postgresql, the database sorts by the values in the distinct clause in order to make it easy to tell if the rows have distinct values (once they're ordered by the values, it only takes one pass for the db to remove duplicates, and it only needs to compare adjacent rows. Because of this, the db forces you to sort by the same columns in the distinct clause.

You can work around this by making your original query a subquery, like so:

SELECT t.id FROM
  (SELECT DISTINCT ON (countries.id) countries.id
    , province_infos.population
    , country_infos.founding_date
   FROM countries
   ...
   ORDER BY countries.id, province_infos.population DESC, country_infos.founding_date  ASC 
  )t
ORDER BY t.population DESC, T.founding_date ASC
George S
  • 2,041
  • 9
  • 13
0

Use GROUP BY, something like this:

SELECT c.id
FROM countries c
...
GROUP BY c.id
ORDER BY MAX(pi.population) DESC, MAX(ci.population) ASC;

Actually, given the nature of your problem, you might want SUM():

SELECT c.id
FROM countries c
...
GROUP BY c.id
ORDER BY SUM(pi.population) DESC, SUM(ci.population) ASC;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Not sure if I fully understand your answer. If I try doing that, I still get the error `column "country.id" must appear in the GROUP BY clause or be used in an aggregate function` – robbieperry22 Nov 30 '18 at 22:12
  • Also, I may be ordering by strings and dates too. I just used the populations as an example to explain the problem. – robbieperry22 Nov 30 '18 at 22:13
  • @robbieperry22 . . . I seemed to have left that important line out of the code. If you are using strings/dates, then use `MIN()` or `MAX()`. – Gordon Linoff Nov 30 '18 at 22:25
  • I think you want use `SUM()` instead of `MAX()` and yes those `....` seem important. So please include it on the question. But I doubt two countries have the exact same population so the second `ORDER BY` is probably irrelevant. Also if you sum by province or counties you should get same total population – Juan Carlos Oropeza Nov 30 '18 at 22:25
  • I added the code omitted by the `...`, but it is just `joins` and `wheres` used for this specific query and ordering example. I also clarified what I'm trying to achieve here. Thanks. – robbieperry22 Nov 30 '18 at 22:43