PostgreSQL does not allow me to group a column with order

Question

In PostgreSQL i want to fetch every users at once and order them by date.

This is my query:

SELECT id, useridx, isread, message, date
  FROM messages
 WHERE isread = 1
 GROUP BY useridx
 ORDER BY date DESC

This is a sample data:

------------------------------------------------------
+  id  |  useridx |  isread  |  messsage |  date     +
------------------------------------------------------
   1   |  1       |  0        | Hello    |  2012-01-01    
   2   |  2       |  1        | Hi       |  2012-01-02    
   3   |  3       |  1        | Test     |  2012-01-03    
   4   |  3       |  0        | My Msg   |  2012-01-04    
   5   |  4       |  1        | sadasd   |  2012-01-05    
   6   |  4       |  1        | sdfsdfd  |  2012-01-06    
   7   |  4       |  0        | sdfsdfsd |  2012-01-07    
   8   |  5       |  0        | 5345634  |  2012-01-08
   9   |  6       |  0        | sdfdfsd  |  2012-01-09
   10  |  7       |  0        | sdfsdfsf |  2012-01-10
------------------------------------------------------

Now, what i want to do is fetch this table by grouping them via useridx and order by date.

Expected Result:

------------------------------------------------------
+  id  |  useridx |  isread  |  messsage |  date     +
------------------------------------------------------  
   6   |  4       |  1        | sdfsdfd  |  2012-01-06 
   3   |  3       |  1        | Test     |  2012-01-03  
   2   |  2       |  1        | Hi       |  2012-01-02    
------------------------------------------------------

Actual Result

ERROR:  column "messages.date" must appear in the GROUP BY clause or be used in an aggregate function

I do not want to group date either. I just want to group with useridx and sort them by date DESC.

Any help/idea is appreciated!

~~Note: I also tried Distinct. Not fit my needs or i did wrongly.~~

I am very confused and stuck between DISTINCT ON and rank() methods.

Conclusion: For who get the same problem here can read this as an answer. Both @kgrittn's and @mu is too short's answers are correct. I will continue to use both answers and schemas on my project and in time i can understand which one is the best -i guess-. So, pick one of them and continue to your work. You will be just fine.

Last Update: Sometimes, Distinct On excludes some ids from result. Lets say i have a id column and i have 6 rows which is same. So, distinct on exlude it from the result BUT rank() just result it. So, use rank()!

Its better to write your own complete answer if you feel their is a need to sum up for a complete solution, or if any answer can be improved, you can suggest edits. — Zia Ul Rehman Mughal, Nov 29 '16 at 07:03

score 13 · Answer 1 · edited May 08 '19 at 10:47

13

PostgreSQL, unlike MySQL, does not show random data for columns which are not aggregated in an aggregated query.

The solution is in the error message

ERROR:  column "messages.date" must appear in the GROUP BY clause or be used in an aggregate function

Which means you must GROUP BY the "messages.date" column or use an aggregate function like MIN() or MAX() when selection this column

Example:

SELECT MIN(id), useridx, isread, message, MAX(date)
FROM messages WHERE isread = 1 
GROUP BY useridx, isread, message
ORDER BY MAX(date) DESC

edited May 08 '19 at 10:47

chooban

9,018
2
20
36

answered Apr 26 '12 at 22:56

ilanco

9,581
4
32
37

I am confused right now. You solution is worked too. Which one is the correct way? Yours or vyegorov's? – flower58 Apr 26 '12 at 23:01
It depends which data you want to get back. Look for example at rows with id 3 and 4, they both have useridx 3. When you GROUP by useridx you have to tell postgres which of the id values you want ... – ilanco Apr 26 '12 at 23:09
I realized now when i run your query in large table, it sometimes shows useridx twice because message is different ? – flower58 Apr 26 '12 at 23:18

mu is too short · Accepted Answer · 2012-04-26T23:41:42.610

10

You want to use the rank() window function to order the results within each useridx group and then peel off the first one by wrapping the ranked results in a derived table:

select id, useridx, isread, message, date
from (
    select id, useridx, isread, message, date,
           rank() over (partition by useridx order by date desc) as r
    from messages
    where isread = 1
) as dt
where r = 1

That will give your the rows with id 2, 3, and 6 from your sample. You might want to add a secondary sort key in the over to consistently make a choice when you have multiple messages per useridx on the same date.

You'll need at least PostgreSQL 8.4 (AFAIK) to have window functions.

edited Apr 26 '12 at 23:41

answered Apr 26 '12 at 23:25

mu is too short

426,620
70
833
800

1

Omg. This also worked like a charm. I am really confused that which answer should i go with it... – flower58 Apr 26 '12 at 23:35
1

@GencerGenç: I'd go with mine of course :) This is exactly the sort of situation the window functions are meant for. vyegorov's has a problem if a user has multiple messages on one date, ilanco's can produce rows that aren't in the table; these are the usual problems with non-window-function approaches to these problems. – mu is too short Apr 26 '12 at 23:42
Thank you! I just done with this. The last thing i have to do is apply this code to the Zend_Db. Answer is accepted. – flower58 Apr 26 '12 at 23:46
Wait a min. What if i want to add more than one WHERE statement? r=1 probably wont work right? – flower58 Apr 26 '12 at 23:59
@GencerGenç: I'm not sure what you mean. Do you want to add more conditions around `isread = 1` or do you want more results per-user? – mu is too short Apr 27 '12 at 01:09
I meant more conditions around `isread = 1`. If i add more conditions do i have to change `r = 1` ? – flower58 Apr 27 '12 at 09:52
@GencerGenç: No, you can add more conditions to the inner query as needed. Try running the inner query by hand and you'll see how `rank()` behaves. – mu is too short Apr 27 '12 at 17:32
Btw, distinct on sometimes exludes some rows from the result. Your solution is the correct one ! – flower58 May 01 '12 at 12:51

score 5 · Answer 3 · answered Apr 27 '12 at 12:21

5

Another option is to use SELECT DISTINCT ON (which is very different from a simple SELECT DISTINCT):

SELECT *
  FROM (SELECT DISTINCT ON (useridx)
            id, useridx, isread, message, date
          FROM messages
          WHERE isread = 1
          ORDER BY useridx, date DESC) x
  ORDER BY date DESC;

In some cases this can scale better than the other approaches.

answered Apr 27 '12 at 12:21

kgrittn

18,113
3
39
47

Hmm. I am really confused right now! This also works like a charm. Both `rank()` and this givies me the exact result with large table. So, which one is the correct way now? `rank()` or `DISTINCT ON`. If possible i want to discuss this. – flower58 Apr 27 '12 at 15:20
Run `EXECUTE ANALYZE ...` for both queries and compare the results. – vyegorov Apr 27 '12 at 15:50
@vyegorov i did `EXPLAIN ANALYZE` for both `DISTINCT ON` and `rank()` and i can only say that both of them is **almost** the same. And after few lines exactly same. Execution time for both is around 3.474 ms for total runtime and 0.077 ms for execution. So, still thinking that which method to go with... – flower58 Apr 27 '12 at 16:13
What you mean by _large table_ then? How many records are there in `messages`? – vyegorov Apr 27 '12 at 16:15
i have a message table that contains around a million messages. And both solution is very fast and effective but dont't know which one to go... – flower58 Apr 27 '12 at 16:20
1

If both perform well and give the right results, you should go with the one you find easier to understand. In the long run that will reduce maintenance cost. – kgrittn Apr 28 '12 at 01:01

score 4 · Answer 4 · answered Aug 26 '16 at 08:29

Years later, but can't you just order in the FROM subquery:

SELECT m.id, m.useridx, m.isread, m.message, m.date
FROM (
   SELECT m2.id, m2.useridx, m2.isread, m2.message, m2.date 
   FROM message m2 
   ORDER BY m2.id ASC, m2.date DESC
) m
WHERE isread = 1
GROUP BY useridx

This works for me in PostgreSQL 9.2

score 2 · Answer 5 · answered Apr 26 '12 at 22:34

2

You are aggregating results.

This means that instead of 2 rows for user 3 you will have just one row. But you also select id, message, isread columns for the aggregated row. How PostgreSQL is supposed to deliver this data? Should it be max() of possible values? Maybe min()?

I assume, that you'd like to have the data on the newest messages. Try this query:

SELECT id, useridx, isread, message, date FROM messages
 WHERE isread = 1 AND (useridx, date) IN
  (SELECT useridx, max(date) FROM messages WHERE isread = 1 GROUP BY useridx);

answered Apr 26 '12 at 22:34

vyegorov

21,787
7
59
73

Thanx for the quick reply but i applied your sample and still mixed. I am unable to sort them by date. :( – flower58 Apr 26 '12 at 22:42
Hmm. For now i added ORDER BY date DESC at the end of the query and worked. If its the incorrect please let me know. – flower58 Apr 26 '12 at 22:51
What happens if you have multiple messages for one `useridx` on the same date? – mu is too short Apr 26 '12 at 23:44
Your solution also works! I tested in a large database and worked like a charm! – flower58 Apr 27 '12 at 10:31
@muistooshort right about mine being vulnerable for cases where several messages fall into same date. Yes, it works in this particular case, but his solution with rank() is the proper one. – vyegorov Apr 27 '12 at 10:35

PostgreSQL does not allow me to group a column with order

5 Answers5