13

This is a problem that I am stumbling across frequently. There have been some similar questions about this problem, but none of them was very complete (And they might possibly be outdated since Rails 4 might have introduced new functions that help with this problem)

Let me give a simple example of the problem and the known ways to 'solve' the problem:


Say I have a User Model and a Post Model, and a User has_many :posts

Now, I want to get a top five of the users with the most posts.

The following are options I know, but they all have their own drawbacks:

1)

users = User.all
@top_users = users.sort {|a,b| a.posts.count <=> b.posts.count}.take(5)

Drawbacks: A DataBase request is made for each user, making this solution very slow.

2) Use SQL code directly with a Join (See for instance this question and answer)

select('users.*, COUNT(posts.id) AS posts_count').joins(:posts).group('users.id').order('posts_count DESC').take(5)

This runs all sorting logic in the DataBase. However:

  • We use a lot of DB-specific code (In PostgreSQL for example we would need other syntax). It would be better to use ActiveRecord methods, if possible.
  • Using an Inner Join means that users without any posts will never be returned. This is a problem when we want to return users without posts as well.

3) Use SQL directly with an Outer Join (see for instance this question and answers)

User.select("users.*, COUNT(posts.id) as posts_count").joins("LEFT OUTER JOIN posts ON posts.user_id = users.id").group("posts.id").order("posts_count DESC")

This also returns users without posts. Drawbacks:

  • Even more DB-specific code as #2, and even harder to read.

4) Use a Counter Cache Column (For a full explanation of this technique, see this Railscasts episode)

basically, create a new column on the User that keeps track of the current count of posts for that user by changing the value in the field each time a new post is created or deleted.

This is very fast and readable. The drawback is that we can only use this after we have defined a new field on the User. For many situations this is acceptable, but it will be harder to make flexible because the users table needs to be changed for this to work per association we might want to create a top-five for. Also, as this is a cached field, there are database manipulations that will not trigger an update on the field.

Is there a nicer(readable and efficient) way to accomplish this? Preferable something that uses built-in ActiveRecord methods.

Community
  • 1
  • 1
Qqwy
  • 5,214
  • 5
  • 42
  • 83
  • There's another option to add to your list with drawbacks. You can add a field `posts_count` to `User` model. If posts are added less often than top five users selected it may be a good attempt. – Glupo Aug 10 '15 at 11:39
  • `In PostgreSQL for example we would need other syntax` – what exactly is DB specific in example #2? – EugZol Aug 10 '15 at 11:43
  • Instead of an explicit `LEFT OUTER JOIN` you might try, `includes(...)` which does an `OUTER JOIN` implicitly. Also, your queries in options 2 and 3 seem generic enough that the syntax may be common across the SQL implementations you are interested in. – lurker Aug 10 '15 at 11:45
  • @Glupo: Ah, yes. A `cache_counter`, right? I will add it to the possibilities in the question post. In many situations that might be a good solution. However, the drawback is of course that the `user` database table needs to be changed for this. – Qqwy Aug 10 '15 at 12:29

4 Answers4

6

Another method, with some limitations that might make it more of a part solution:

User.where(:id => Post.group(:user_id).
                       order("count(*) desc").
                       limit(5).
                       keys)

This would be extremely efficient in database terms at finding the five users with the highest numbers of posts as it only needs to scan an index on the posts table's user_id columns, so would be good for very large data sets. It's also pretty "clean" Rails/ActiveRecord code that ought to be practically database independent.

If returning the Users in their post-count order is critical then a less efficient sorting method could be used once those five have been identified, or the retrieval order of the keys could be used in ruby to sort the returned Users.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • I really like this approach! :-) `count(*)` is new to me and seems very smart for this situation. – Qqwy Aug 10 '15 at 15:53
  • 2
    Well it's certainly efficient, but it's incompatible with applying any conditions to the User model (such as active/inactive), and with the use of multiple order conditions (e.g. order by number of posts, then number of comments), so it is not for every situation. – David Aldridge Aug 10 '15 at 16:07
  • Why is there a .count call in the chain? Will it not just return single value not applicable for .keys? – Pavel Bulanov Jun 01 '16 at 09:09
1

Here's a method worth looking at:

User.joins("left join posts on posts.user_id = users.id").
     group(:id).
     order("count(*) desc").
     limit(5)

It's a little bit manual in the join, but if you knew that at least five users had a post, or didn't want to list any users who did not have a post, then you could use a regular join:

User.joins(:posts).
     group(:id).
     order("count(*) desc").
     limit(5)

The count(*) is not necessarily robust if you had other has_many joins in there, but in that case you'd probably want to generate a query such as:

select ...
from   users ...
order by (select count(*) from posts where posts.user_id = users.id)

p.s. Tested on PostgreSQL. The GROUP BY on the ID column certainly wouldn't work on Oracle, not sure about others.

David Aldridge
  • 51,479
  • 8
  • 68
  • 96
1

This option might be worth looking into, didn't test it, so might need some tweaking.

class Post < ActiveRecord::Base
  belongs_to :user, counter_cache: true
end

Use counter_cache and you'll hit one table in your db.

class User < ActiveRecord::Base
  has_many :posts

  def self.top_5
    order('post_counts DESC').limit(5)
  end
end

Add posts_count integer column in users table with default 0.

class AddPostsCountToUsers < ActiveRecord::Migration
  def change
    add_column :users, :posts_count, :integer, default: 0
  end
end

If you already have existing users in your db.

You'll need to run the following in your console, or make that into a rake task if you need to run it a few times:

User.find_each { |user| User.reset_counters(user.id, :posts) }
neo
  • 4,078
  • 4
  • 25
  • 41
0

You can do like below also-

User.joins(:posts).select('users.*, count(*) as posts_count').group('users.id').order('posts_count')
Vishnu Atrai
  • 2,370
  • 22
  • 24