23

After googling, browsing SO and reading, there doesn't seem to be a Rails-style way to efficiently get only those Parent objects which have at least one Child object (through a has_many :children relation). In plain SQL:

SELECT *
  FROM parents
 WHERE EXISTS (
               SELECT 1
                 FROM children
                WHERE parent_id = parents.id)

The closest I've come is

Parent.all.reject { |parent| parent.children.empty? }

(based on another answer), but it's really inefficient because it runs a separate query for each Parent.

Community
  • 1
  • 1
l0b0
  • 55,365
  • 30
  • 138
  • 223

6 Answers6

57
Parent.joins(:children).uniq.all
l0b0
  • 55,365
  • 30
  • 138
  • 223
Chris Bailey
  • 4,126
  • 24
  • 28
  • 1
    This results in a single SQL statement, and it's short and readable. Awesome. – l0b0 Mar 29 '12 at 10:44
  • 3
    Yes you did. **Parent.joins(:children).uniq.all** is an array and **Parent.joins(:children).uniq** is an ActiveRelation object. Note ActiveRelation objects are lazy and don't execute until explicity requested to. Calling **all** forces the object to evaluate the SQL with the DB – bradgonesurfing Mar 29 '12 at 10:45
  • 4
    Why does this work? I understand SQL but...can someone please explain? – Edward Aug 26 '15 at 04:21
  • Wizardry! How can this be done for multiple has_many associations – Weston Ganger Jun 27 '16 at 16:04
  • pure SQL version: Parent.joins(:children).distinct.all – nazar kuliyev Jun 28 '19 at 05:27
13

As of Rails 5.1, uniq is deprecated and distinct should be used instead.

Parent.joins(:children).distinct

This is a follow-up on Chris Bailey's answer. .all is removed as well from the original answer as it doesn't add anything.

Community
  • 1
  • 1
Fellow Stranger
  • 32,129
  • 35
  • 168
  • 232
5

The accepted answer (Parent.joins(:children).uniq) generates SQL using DISTINCT but it can be slow query. For better performance, you should write SQL using EXISTS:

Parent.where<<-SQL
EXISTS (SELECT * FROM children c WHERE c.parent_id = parents.id)
SQL

EXISTS is much faster than DISTINCT. For example, here is a post model which has comments and likes:

class Post < ApplicationRecord
  has_many :comments
  has_many :likes
end

class Comment < ApplicationRecord
  belongs_to :post
end

class Like < ApplicationRecord
  belongs_to :post
end

In database there are 100 posts and each post has 50 comments and 50 likes. Only one post has no comments and likes:

# Create posts with comments and likes
100.times do |i|
  post = Post.create!(title: "Post #{i}")
  50.times do |j|
    post.comments.create!(content: "Comment #{j} for #{post.title}")
    post.likes.create!(user_name: "User #{j} for #{post.title}")
  end
end

# Create a post without comment and like
Post.create!(title: 'Hidden post')

If you want to get posts which have at least one comment and like, you might write like this:

# NOTE: uniq method will be removed in Rails 5.1
Post.joins(:comments, :likes).distinct

The query above generates SQL like this:

SELECT DISTINCT "posts".* 
FROM "posts" 
INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" 
INNER JOIN "likes" ON "likes"."post_id" = "posts"."id"

But this SQL generates 250000 rows(100 posts * 50 comments * 50 likes) and then filters out duplicated rows, so it could be slow.

In this case you should write like this:

Post.where <<-SQL
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id)
AND
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
SQL

This query generates SQL like this:

SELECT "posts".* 
FROM "posts" 
WHERE (
EXISTS (SELECT * FROM comments c WHERE c.post_id = posts.id) 
AND 
EXISTS (SELECT * FROM likes l WHERE l.post_id = posts.id)
)

This query does not generate useless duplicated rows, so it could be faster.

Here is benchmark:

              user     system      total        real
Uniq:     0.010000   0.000000   0.010000 (  0.074396)
Exists:   0.000000   0.000000   0.000000 (  0.003711)

It shows EXISTS is 20.047661 times faster than DISTINCT.

I pushed the sample application in GitHub, so you can confirm the difference by yourself:

https://github.com/JunichiIto/exists-query-sandbox

Junichi Ito
  • 2,438
  • 1
  • 23
  • 46
3

I have just modified this solution for your need.

Parent.joins("left join childrens on childrends.parent_id = parents.id").where("childrents.parent_id is not null")
Community
  • 1
  • 1
Soundar Rathinasamy
  • 6,658
  • 6
  • 29
  • 47
2

You just want an inner join with a distinct qualifier

SELECT DISTINCT(*) 
FROM parents
JOIN children
ON children.parent_id = parents.id

This can be done in standard active record as

Parent.joins(:children).uniq

However if you want the more complex result of find all parents with no children you need an outer join

Parent.joins("LEFT OUTER JOIN children on children.parent_id = parent.id").
where(:children => { :id => nil })

which is a solution which sux for many reasons. I recommend Ernie Millers squeel library which will allow you to do

Parent.joins{children.outer}.where{children.id == nil}
bradgonesurfing
  • 30,949
  • 17
  • 114
  • 217
  • look at @chris-bailey's answer when you need to use a `JOIN` for some reason while writing it in a clean and concise way. – simonmenke Mar 29 '12 at 10:20
  • This is equivalent to @ChrisBailey's answer - No go. – l0b0 Mar 29 '12 at 10:39
  • It is the same solution I proposed. However it is very rare to only need to do a join. Squeel shines for complex queries that AR get's very ugly with. – bradgonesurfing Mar 29 '12 at 10:39
1

try including the children with #includes()

Parent.includes(:children).all.reject { |parent| parent.children.empty? }

This will make 2 queries:

SELECT * FROM parents;
SELECT * FROM children WHERE parent_id IN (5, 6, 8, ...);

[UPDATE]

The above solution is usefull when you need to have the Child objects loaded. But children.empty? can also use a counter cache1,2 to determine the amount of children.

For this to work you need to add a new column to the parents table:

# a new migration
def up
  change_table :parents do |t|
    t.integer :children_count, :default => 0
  end

  Parent.reset_column_information
  Parent.all.each do |p|
    Parent.update_counters p.id, :children_count => p.children.length
  end
end

def down
  change_table :parents do |t|
    t.remove :children_count
  end
end

Now change your Child model:

class Child
  belongs_to :parent, :counter_cache => true
end

At this point you can use size and empty? without touching the children table:

Parent.all.reject { |parent| parent.children.empty? }

Note that length doesn't use the counter cache whereas size and empty? do.

simonmenke
  • 2,819
  • 19
  • 28
  • This is a bad answer as the correct answer is an inner join. The above is very inefficient and make multiple queries and ruby loops. – bradgonesurfing Mar 29 '12 at 10:11
  • 2
    @bradgonesurfing No my first solution will never make multiple queries while looping through the parents. (note the `.includes(:children)`) It is true though that Rails might turn the 2 queries above into 1 query (with a JOIN) when it needs too. – simonmenke Mar 29 '12 at 10:16
  • Never said it would make multiple queries whilst looping. Your solution does makes "multiple queries", two of them and then you loop through the returned set in ruby using reject. That is very slow compared to a fast inner join in the DB. – bradgonesurfing Mar 29 '12 at 10:43
  • However to be fair in general the solution you propose does fix the general **1+N** problem that the OP sees in his original solution. It's just not the right solution for this specific problem :) – bradgonesurfing Mar 29 '12 at 10:49