2

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:

Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size

In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.

EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.

2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!

mrzasa
  • 22,895
  • 11
  • 56
  • 94
John
  • 485
  • 2
  • 5
  • 15
  • How is this array of strings stored in the database - as a separate table (table schema, please) or an Array field? And what DBMS are you using? – PinnyM Sep 16 '13 at 15:23
  • 1
    I appears that John is storing the list of `tags` in a single column by serializing the array (@John, please keep me honest) – Rajesh Kolappakam Sep 16 '13 at 15:24
  • This is stored as an array in a single column in MongoDB, not as a separate table (which would have made life much easier in retrospect). – John Sep 16 '13 at 15:39
  • 1
    If you are using MongoDB, why not use Mongoid instead of ActiveRecord? – PinnyM Sep 16 '13 at 17:30

3 Answers3

1

Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.

See this SO thread How to check if an array field is a part of another array in MongoDB?

Community
  • 1
  • 1
Rajesh Kolappakam
  • 2,095
  • 14
  • 12
0

Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.

However, there is one problem with your database query

Article.where(status: 'Finished')

# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'

Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:

Article.where(status: 'Finished').pluck(:tags)

# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'
Rajesh Kolappakam
  • 2,095
  • 14
  • 12
  • ActiveRecord has built-in support to serialize `Array` and `Hash` into a single column. If you define a `text` column for `tags` and you assign `tags = @your_array` it gets serialized and saved to database and when you query the record, it gets deserialized into an `Array` object. See this SO thread, for example, http://stackoverflow.com/questions/6694432/using-rails-serialize-to-save-hash-to-database – Rajesh Kolappakam Sep 16 '13 at 15:39
0

I answered a question regarding general intersection like queries in ActiveRecord here.

Extracted below:


The following is a general approach I use for constructing intersection like queries in ActiveRecord:

class Service < ActiveRecord::Base
  belongs_to :person

  def self.with_types(*types)
    where(service_type: types)
  end
end

class City < ActiveRecord::Base
  has_and_belongs_to_many :services
  has_many :people, inverse_of: :city
end

class Person < ActiveRecord::Base
  belongs_to :city, inverse_of: :people

  def self.with_cities(cities)
    where(city_id: cities)
  end

  # intersection like query
  def self.with_all_service_types(*types)
    types.map { |t|
      joins(:services).merge(Service.with_types t).select(:id)
    }.reduce(scoped) { |scope, subquery|
      scope.where(id: subquery)
    }
  end
end

Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))

It will generate SQL of the form:

SELECT "people".*
  FROM "people"
 WHERE "people"."id" in (SELECT "people"."id" FROM ...)
   AND "people"."id" in (SELECT ...)
   AND ...

You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.

Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

Community
  • 1
  • 1
Andrew Hacking
  • 6,296
  • 31
  • 37