4

I am trying to query my PostgreSQL database to get the latest (by created_at) and distinct (by user_id) Activity objects, where each user has multiple activities in the database. The activity object is structured as such:

Activity(id, user_id, created_at, ...)

I first tried to get the below query to work:

Activity.order('created_at DESC').select('DISTINCT ON (activities.user_id) activities.*')

however, kept getting the below error:

ActiveRecord::StatementInvalid: PG::InvalidColumnReference: ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions

According to this post: PG::Error: SELECT DISTINCT, ORDER BY expressions must appear in select list, it looks like The ORDER BY clause can only be applied after the DISTINCT has been applied. This does not help me, as I want to get the distinct activities by user_id, but also want the activities to be the most recently created activities. Thus, I need the activities to be sorted before getting the distinct activities.

I have come up with a solution that works, but first grouping the activities by user id, and then ordering the activities within the groups by created_at. However, this takes two queries to do.

I was wondering if what I want is possible in just one query?

fool-dev
  • 7,671
  • 9
  • 40
  • 54

1 Answers1

2

This should work, try the following

Solution 1

Activity.select('DISTINCT ON (activities.user_id) activities.*').order('created_at DESC')

Solution 2

If not work Solution 1 then this is helpful if you create a scope for this

activity model

scope :latest, -> {
                    select("distinct on(user_id) activities.user_id,
                                activities.*").
                    order("user_id, created_at desc")
                }

Now you can call this anywhere like below

Activity.latest

Hope it helps

fool-dev
  • 7,671
  • 9
  • 40
  • 54
  • @MarcusChristiansen let's try the solution – fool-dev Jan 26 '18 at 17:58
  • Hi @fool-dev, this seems to be working! Thank you! If you have any time, could you quickly explain how this works? If I exclude the order clause, I don't get the most recent one, but if I include it, I get the most recent one. This seems odd, as it seems that the order clause is changing the contents of the sql query rather than ordering the existing contents. – Marcus Christiansen Jan 26 '18 at 18:14
  • 2
    @MarcusChristiansen that's a feature of `DISTINCT ON`: it only picks the **first** record for every distinct value of the given expression (whether a single value, as done here, or a tuple, if you specified multiple). Usage of `DISTINCT ON` requires that results are first ordered by the same expressions, but subsequent ordering defines which records will appear first and will make it into the result set. Postgres' docs explain this pretty well. – D-side Jan 28 '18 at 04:22