1

I want a user to be able to find all posts that have one or more tags. And I'd like the tags to be additive criteria, so for example you could search for posts that have just the 'News' tag, or you could search for posts that have both the 'News' and 'Science' tags.

Currently what I have, and it works, is a Post model, a Tag model, and a join model called Marking. Post has_many :tags, through: :markings. I get what I need by passing an array of Tag ids to a Post class method:

post.rb

def self.from_tag_id_array array
  post_array = []
  Marking.where(tag_id: array).group_by(&:post_id).each do |p_id,m_array|
    post_array << p_id if m_array.map(&:tag_id).sort & array.sort == array.sort
  end
  where id: post_array
end

This seems like a clunky way to get there. Is there a way I can do this with a scope on an association or something of the like?

eeeeeean
  • 1,774
  • 1
  • 17
  • 31
  • 1
    You can try something similar to this. http://stackoverflow.com/questions/25606775/how-to-find-posts-tagged-with-more-than-one-tag-in-rails-and-postgresql – Jayaprakash Oct 08 '16 at 08:16

1 Answers1

2

So the general rule of thumb with building these kinds of queries is to minimize work in "Ruby-land" and maximize work in "Database-land". In your solution above, you're fetching a set of markings with any tags in the set array, which presumably will be a very large set (all posts that have any of those tags). This is represented in a ruby array and processed (group_by is in Ruby-world, group is the equivalent in Database-land).

So aside from being hard-to-read, that solution is going to be slow for any large set of markings.

There are a couple ways to solve the problem without doing any heavy lifting in Ruby-world. One way is using subqueries, like this:

scope :with_tag_ids, ->(tag_ids) {
  tag_ids.map { |tag_id|
    joins(:markings).where(markings: { tag_id: tag_id })
  }.reduce(all) { |scope, subquery| scope.where(id: subquery) }
}

This generates a query like this (again for tag_ids 5 and 8)

SELECT "posts".*
FROM "posts"
WHERE "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 5)
  AND "posts"."id" IN (SELECT "posts"."id" FROM "posts" INNER JOIN "markings" ON "markings"."post_id" = "posts"."id" WHERE "markings"."tag_id" = 8)

Note that since everything here is calculated directly in SQL, no arrays are generated or processed in Ruby. This will generally scale much better.

Alternatively, you can use COUNT and do it in a single query without subqueries:

scope :with_tag_ids, ->(tag_ids) {
  joins(:markings).where(markings: { tag_id: tag_ids }).
  group(:post_id).having('COUNT(posts.id) = ?', tag_ids.count)
}

Which generates SQL like this:

SELECT "posts".*
FROM "posts"
INNER JOIN "markings" ON "markings"."post_id" = "posts"."id"
WHERE "markings"."tag_id" IN (5, 8)
GROUP BY "post_id"
HAVING (COUNT(posts.id) = 2)

This assumes that you don't have multiple markings with the same pair of tag_id and post_id, which would throw off the count.

I would imagine that the last solution is probably the most efficient, but you should try different solutions and see what works best for your data.

See also: Query intersection with activerecord

Community
  • 1
  • 1
Chris Salzberg
  • 27,099
  • 4
  • 75
  • 82