107

I've written a couple of complex queries (at least to me) with Ruby on Rail's query interface:

watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})

Both of these queries work fine by themselves. Both return Post objects. I would like to combine these posts into a single ActiveRelation. Since there could be hundreds of thousands of posts at some point, this needs to be done at the database level. If it were a MySQL query, I could simply user the UNION operator. Does anybody know if I can do something similar with RoR's query interface?

divibisan
  • 11,659
  • 11
  • 40
  • 58
LandonSchropp
  • 10,084
  • 22
  • 86
  • 149
  • You should be able to use [scope](http://api.rubyonrails.org/classes/ActiveRecord/NamedScope/ClassMethods.html#method-i-scope). Create 2 scopes and then call them both like `Post.watched_news_posts.watched_topic_posts`. You may need to send in params to the scopes for things like the `:user_id` and `:topic`. – Zabba Jul 14 '11 at 01:44
  • 7
    Thanks for the suggestion. According to the docs, "A scope represents a narrowing of a database query". In my case, I'm not looking for posts that are in both watched_news_posts and watched_topic_posts. Rather, I'm looking for posts that are in watched_news_posts or watched_topic_posts, with no duplicates allowed. Is this still possible to accomplish with scopes? – LandonSchropp Jul 14 '11 at 03:01
  • 1
    Not really possible out-of-the-box. There is a plugin on github called union but it uses old-school syntax (class method and hash-style query params), if that's cool with you I'd say go with it... otherwise write it out the long way in a find_by_sql in your scope. – jenjenut233 Jul 14 '11 at 03:28
  • 1
    I agree with jenjenut233, and I would think you could do something like `find_by_sql("#{watched_news_posts.to_sql} UNION #{watched_topic_posts.to_sql}")`. I haven't tested that out, so let me know how it goes if you try it. Also, there's probably some ARel functionality that would work. – Wizard of Ogz Jul 14 '11 at 16:58
  • 3
    Well I rewrote the queries as SQL queries. They work now, but unfortunately `find_by_sql` can't be used with other chainable queries, which means I now have to rewrite my will_paginate filters and queries as well. Why doesn't ActiveRecord support a `union` operation? – LandonSchropp Jul 15 '11 at 01:43
  • You should be able to accomplish this with a left outer join – mc. Sep 15 '15 at 18:39

17 Answers17

108

Here's a quick little module I wrote that allows you to UNION multiple scopes. It also returns the results as an instance of ActiveRecord::Relation.

module ActiveRecord::UnionScope
  def self.included(base)
    base.send :extend, ClassMethods
  end

  module ClassMethods
    def union_scope(*scopes)
      id_column = "#{table_name}.id"
      sub_query = scopes.map { |s| s.select(id_column).to_sql }.join(" UNION ")
      where "#{id_column} IN (#{sub_query})"
    end
  end
end

Here's the gist: https://gist.github.com/tlowrimore/5162327

Edit:

As requested, here's an example of how UnionScope works:

class Property < ActiveRecord::Base
  include ActiveRecord::UnionScope

  # some silly, contrived scopes
  scope :active_nearby,     -> { where(active: true).where('distance <= 25') }
  scope :inactive_distant,  -> { where(active: false).where('distance >= 200') }

  # A union of the aforementioned scopes
  scope :active_near_and_inactive_distant, -> { union_scope(active_nearby, inactive_distant) }
end
Community
  • 1
  • 1
Tim Lowrimore
  • 2,024
  • 1
  • 17
  • 9
  • 3
    This is really a way more complete answer the other others listed above. Works great! – hayesgm Aug 15 '13 at 04:36
  • Example of usage would be nice. – ciembor Feb 18 '14 at 15:59
  • As requested, I've added an example. – Tim Lowrimore Mar 18 '14 at 18:12
  • I get an exception with SQLite: only a single result allowed for a SELECT that is part of an expression. – Sarah Vessels Apr 11 '14 at 02:07
  • @SarahVessels That error seems to imply that the nested select statements are querying more than one column; however, the `union_scope` definition explicitly selects the `id` column and nothing more. I'd be curious to see what the resulting SQL is for your scenario. Have you tried calling `to_sql` on your scope to see what the generated SQL looks like? – Tim Lowrimore Apr 14 '14 at 16:07
  • 3
    The solution is "almost" correct and I gave it a +1, but I ran into a problem that I fixed here: https://gist.github.com/lsiden/260167a4d3574a580d97 – Lawrence I. Siden Jul 21 '14 at 15:43
  • 8
    Quick warning: this method is highly problematic from a performance perspective with MySQL, since the subquery will be counted as dependent and executed for each record in the table (see http://www.percona.com/blog/2010/10/25/mysql-limitations-part-3-subqueries/). – shosti Sep 16 '14 at 23:01
86

I also have encountered this problem, and now my go-to strategy is to generate SQL (by hand or using to_sql on an existing scope) and then stick it in the from clause. I can't guarantee it's any more efficient than your accepted method, but it's relatively easy on the eyes and gives you a normal ARel object back.

watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id})
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id})

Post.from("(#{watched_news_posts.to_sql} UNION #{watched_topic_posts.to_sql}) AS posts")

You can do this with two different models as well, but you need to make sure they both "look the same" inside the UNION -- you can use select on both queries to make sure they will produce the same columns.

topics = Topic.select('user_id AS author_id, description AS body, created_at')
comments = Comment.select('author_id, body, created_at')

Comment.from("(#{comments.to_sql} UNION #{topics.to_sql}) AS comments")
Elliot Nelson
  • 11,371
  • 3
  • 30
  • 44
  • suppose if we have two different model then please let me know what will be the query for unoin. – Chitra Jan 14 '16 at 10:56
  • 1
    Very helpful answer. For future readers, remember the final "AS comments" portion because activerecord constructs the query as 'SELECT "comments"."*" FROM"... if you don't specify the name of the unioned set OR specify a different name like "AS foo", the final sql execution will fail. – HeyZiko Oct 26 '16 at 17:49
  • 1
    This was exactly what I was looking for. I extended ActiveRecord::Relation to support `#or` in my Rails 4 project. Assuming same model: `klass.from("(#{to_sql} union #{other_relation.to_sql}) as #{table_name}")` – M. Wyatt Sep 28 '18 at 19:07
  • Would be possible to have a third model (without a real table) to work like an uniform "interface" for the first two models? Eg. Wine and Soda, and a thrid model called Drink, which will union the first two for query only purpose. – Theo B Mar 15 '22 at 15:26
11

You could also use Brian Hempel's active_record_union gem that extends ActiveRecord with an union method for scopes.

Your query would be like this:

Post.joins(:news => :watched).
  where(:watched => {:user_id => id}).
  union(Post.joins(:post_topic_relationships => {:topic => :watched}
    .where(:watched => {:user_id => id}))

Hopefully this will be eventually merged into ActiveRecord some day.

Community
  • 1
  • 1
dgilperez
  • 10,716
  • 8
  • 68
  • 96
11

Based on Olives' answer, I did come up with another solution to this problem. It feels a little bit like a hack, but it returns an instance of ActiveRelation, which is what I was after in the first place.

Post.where('posts.id IN 
      (
        SELECT post_topic_relationships.post_id FROM post_topic_relationships
          INNER JOIN "watched" ON "watched"."watched_item_id" = "post_topic_relationships"."topic_id" AND "watched"."watched_item_type" = "Topic" WHERE "watched"."user_id" = ?
      )
      OR posts.id IN
      (
        SELECT "posts"."id" FROM "posts" INNER JOIN "news" ON "news"."id" = "posts"."news_id" 
        INNER JOIN "watched" ON "watched"."watched_item_id" = "news"."id" AND "watched"."watched_item_type" = "News" WHERE "watched"."user_id" = ?
      )', id, id)

I'd still appreciate it if anybody has any suggestions to optimize this or improve the performance, because it's essentially executing three queries and feels a little redundant.

LandonSchropp
  • 10,084
  • 22
  • 86
  • 149
  • How could I do the same thing with this: https://gist.github.com/2241307 So that it creates an AR::Relation class rather than an Array class? – Marc Mar 29 '12 at 18:11
6

How about...

def union(scope1, scope2)
  ids = scope1.pluck(:id) + scope2.pluck(:id)
  where(id: ids.uniq)
end
sth
  • 222,467
  • 53
  • 283
  • 367
Richard Wan
  • 125
  • 1
  • 1
6

Could you use an OR instead of a UNION?

Then you could do something like:

Post.joins(:news => :watched, :post_topic_relationships => {:topic => :watched})
.where("watched.user_id = :id OR topic_watched.user_id = :id", :id => id)

(Since you are joins the watched table twice I'm not too sure what the names of the tables will be for the query)

Since there are a lot of joins, it might also be quite heavy on the database, but it might be able to be optimized.

Kyle d'Oliveira
  • 6,382
  • 1
  • 27
  • 33
  • 3
    Sorry to get back to you so late, but I've been on vacation for the last couple days. The problem I had when I tried your answer was the joins method was causing both tables to be joined, rather than two separate queries which could then be compared. However, your idea was sound and did give me another idea. Thanks for the help. – LandonSchropp Jul 18 '11 at 23:45
  • select using OR is slower than than UNION, wondering any solution for UNION instead – Nich Jan 21 '16 at 01:41
6

Arguably, this improves readability, but not necessarily performance:

def my_posts
  Post.where <<-SQL, self.id, self.id
    posts.id IN 
    (SELECT post_topic_relationships.post_id FROM post_topic_relationships
    INNER JOIN watched ON watched.watched_item_id = post_topic_relationships.topic_id 
    AND watched.watched_item_type = "Topic" 
    AND watched.user_id = ?
    UNION
    SELECT posts.id FROM posts 
    INNER JOIN news ON news.id = posts.news_id 
    INNER JOIN watched ON watched.watched_item_id = news.id 
    AND watched.watched_item_type = "News" 
    AND watched.user_id = ?)
  SQL
end

This method returns an ActiveRecord::Relation, so you could call it like this:

my_posts.order("watched_item_type, post.id DESC")
richardsun
  • 3,245
  • 1
  • 18
  • 22
2

There is an active_record_union gem. Might be helpful

https://github.com/brianhempel/active_record_union

With ActiveRecordUnion, we can do:

the current user's (draft) posts and all published posts from anyone current_user.posts.union(Post.published) Which is equivalent to the following SQL:

SELECT "posts".* FROM (
  SELECT "posts".* FROM "posts"  WHERE "posts"."user_id" = 1
  UNION
  SELECT "posts".* FROM "posts"  WHERE (published_at < '2014-07-19 16:04:21.918366')
) posts
Community
  • 1
  • 1
1

In a similar case I summed two arrays and used Kaminari:paginate_array(). Very nice and working solution. I was unable to use where(), because I need to sum two results with different order() on the same table.

sekrett
  • 1,205
  • 1
  • 15
  • 17
1

Heres how I joined SQL queries using UNION on my own ruby on rails application.

You can use the below as inspiration on your own code.

class Preference < ApplicationRecord
  scope :for, ->(object) { where(preferenceable: object) }
end

Below is the UNION where i joined the scopes together.

  def zone_preferences
    zone = Zone.find params[:zone_id]
    zone_sql = Preference.for(zone).to_sql
    region_sql = Preference.for(zone.region).to_sql
    operator_sql = Preference.for(Operator.current).to_sql

    Preference.from("(#{zone_sql} UNION #{region_sql} UNION #{operator_sql}) AS preferences")
  end
joeyk16
  • 1,357
  • 22
  • 49
1

Less problems and easier to follow:

    def union_scope(*scopes)
      scopes[1..-1].inject(where(id: scopes.first)) { |all, scope| all.or(where(id: scope)) }
    end

So in the end:

union_scope(watched_news_posts, watched_topic_posts)
Dmitry Polushkin
  • 3,283
  • 1
  • 38
  • 44
  • 1
    I changed it slightly to: `scopes.drop(1).reduce(where(id: scopes.first)) { |query, scope| query.or(where(id: scope)) }` Thx! – eikes Feb 12 '20 at 13:02
1
gem 'active_record_extended' 

Also has a set of union helpers among many others.

1

If you don't want to use SQL syntax inside your code, here's solution with arel

watched_news_posts = Post.joins(:news => :watched).where(:watched => {:user_id => id}).arel
watched_topic_posts = Post.joins(:post_topic_relationships => {:topic => :watched}).where(:watched => {:user_id => id}).arel

results = Arel::Nodes::Union.new(watched_news_posts, watched_topic_posts)
from(Post.arel_table.create_table_alias(results, :posts))
Vlad Hilko
  • 1,104
  • 12
  • 17
0

I would just run the two queries you need and combine the arrays of records that are returned:

@posts = watched_news_posts + watched_topics_posts

Or, at the least test it out. Do you think the array combination in ruby will be far too slow? Looking at the suggested queries to get around the problem, I'm not convinced that there will be that significant of a performance difference.

j0k
  • 22,600
  • 28
  • 79
  • 90
0

Elliot Nelson answered good, except the case where some of the relations are empty. I would do something like that:

def union_2_relations(relation1,relation2)
sql = ""
if relation1.any? && relation2.any?
  sql = "(#{relation1.to_sql}) UNION (#{relation2.to_sql}) as #{relation1.klass.table_name}"
elsif relation1.any?
  sql = relation1.to_sql
elsif relation2.any?
  sql = relation2.to_sql
end
relation1.klass.from(sql)

end

Ehud
  • 105
  • 5
0

When we add UNION to the scopes, it breaks at time due to order_by clause added before the UNION.

So I changed it in a way to give it a UNION effect.

module UnionScope
  def self.included(base)
    base.send(:extend, ClassMethods)
  end

  module ClassMethods
    def union_scope(*scopes)
      id_column = "#{table_name}.id"
      sub_query = scopes.map { |s| s.pluck(:id) }.flatten
      where("#{id_column} IN (?)", sub_query)
    end
  end
end

And then use it like this in any model

class Model
  include UnionScope
  scope :union_of_scopeA_scopeB, -> { union_scope(scopeA, scopeB) }
end
Haroon Khalid
  • 155
  • 2
  • 8
0

Tim's answer is great. It uses the ids of the scopes in the WHERE clause. As shosti reports, this method is problematic in terms of performance because all ids need to be generated during query execution. This is why, I prefer joeyk16 answer. Here a generalized module:

module ActiveRecord::UnionScope
  def self.included(base)
    base.send :extend, ClassMethods
  end

  module ClassMethods
    def self.union(*scopes)
      self.from("(#{scopes.map(&:to_sql).join(' UNION ')}) AS #{self.table_name}")
    end 
  end
end
Motine
  • 1,638
  • 18
  • 18