2

I have a table myTable with four columns:

id        UUID,
user_id   UUID ,
text      VARCHAR ,
date      TIMESTAMP

(id is the primary key and user_id is not unique in this table)

I want to retrieve the user_ids ordered by their newest entry, which i am currently doing with this query:

SELECT user_id FROM myTable GROUP BY user_id ORDER BY MAX(date) DESC

The problem is that GROUP BY takes a long time. Is there a faster way to accomplish this? I tried using a window function with PARTITION BY as described here Retrieving the last record in each group - MySQL, but it didn't really speed things up. I've also made sure that user_id is indexed.

My postgres version is 10.4

Edit: The query above that I'm currently using is functionally correct, the problem is that it's slow.

GMB
  • 216,147
  • 25
  • 84
  • 135
guest856
  • 171
  • 2
  • 11
  • 1
    What do you mean with "ordered by their newest entry"? Do you want to see **only** the newest entry (what your title suggests) or **all** of them, but with some kind of special sort order? –  Jul 29 '20 at 14:23
  • I want **all** the user_ids, sorted by each user_id's newest date. The query above is correct, it's just slow. I'm not looking for complete entries, only the user_ids. – guest856 Jul 29 '20 at 14:27
  • Then why can't you simply use `select user_id from the_table order by user_id, "date" desc`? –  Jul 29 '20 at 14:38
  • Perhaps i'm explaining this poorly, I want all the user_ids, but each user_id exactly one time. – guest856 Jul 29 '20 at 15:15

2 Answers2

3

Your query seems like a relevant approach for your requirement:

select user_id 
from mytable 
group by user_id 
order by max(date) desc

I would recommend an index on (user, date desc) to speed things up. It needs to be a single index on both colums.

You could also give a try to distinct on, which might, or might not, give you better performance:

select user_id
from (
    select distinct on(user_id) user_id, date
    from mytable
    order by user_id, date desc
) t
order by date desc
GMB
  • 216,147
  • 25
  • 84
  • 135
1

Start with an index on user_id, date desc. That might help.

You can also try filtering -- once you have such an index:

select t.user_id
from myTable t
where t.date = (select max(t2.date)
                from myTable t2
                where t2.user_id = t.user_id
               )
order by t.date desc

However, you might find that the order by ends up taking almost as much time as the group by.

This version will definitely use the index for the subquery:

select user_id
from (select distinct on (user_id) user_id, date
      from myTable t
      order by user_id, date desc
     ) t
order by date desc;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    Adding the index sped things up :). In addition, both your suggestions turned out to be faster than my current query, although the first approach gives some duplicates, presumably because some user_ids have multiple entries with the same newest date. – guest856 Jul 29 '20 at 15:03