1

I'm trying to query a table, listings, based on a has_many relationship with listing_options. Listing options have different types and I'd like to make a SQL query that can match listings with listing_options. Here's the SQL query atm:

SELECT "listings".* FROM "listings" INNER JOIN "listing_options" ON 
"listing_options"."listing_id" = "listings"."id" WHERE ("listings".state IS NULL) AND
(listing_options.option_id IN ('1','2')) AND
(listing_options.option_id IN ('4','5','7')) LIMIT 12 OFFSET 0

Let's say I have two listings, with option values (1,5) and (1,6), respectively. I expect the query to return the first listing but not the second. The trouble is the query considers only one option value at a time, i.e. a option value can't be both 1 and 5, thus the query returns empty.

ActiveRecord to generate the SQL:

Listing.joins(:listing_options).where("listing_options.option_id IN (?)", [1,2]).where("listing_options.option_id IN (?)", [4,5,7])

Note: structuring the query like this results in duplicate records being returned if one part of the query is omitted, which may be part of the issue.

Same query with includes (also no results returned):

Listing.includes(:listing_options).where("listing_options.option_id IN (?)", [1,2]).references(:listing_options).where("listing_options.option_id IN (?)", [4,5,7])

class Listing
has_many :listing_options
has_and_belongs_to_many :options, join_table: :listing_options

class Option
belongs_to :option_type, inverse_of: :option

I've tried inversing the query to no avail (although doing it this way would also be more complicated as I would need to specify the option_ids to avoid based on option types).

Example:

Listing.includes(:listing_options).where("listing_options.option_id NOT IN (?)", [3,6,8]).references(:listing_options)

This query returns listings - more than it should. The query should be reasonably efficient for large numbers of listings and up to four types of listing options (additional queries on listing options). For this reason I want to do just one query rather than repetitively querying results.

edit - Solution - SQL from @ErwinBrandstetter written as search scope

.where('EXISTS (SELECT 1 FROM listing_options o1 JOIN listing_options o2 USING (listing_id) WHERE o1.listing_id = id AND o1.option_id IN (?) AND o2.option_id IN (?))', [1,2], [4,5,7])
bjorn
  • 175
  • 2
  • 9
  • You're trying to find listings that have at least one of options `[1,2]` and at least one of options `[4,5,7]`? So something with options `[1,5]` would be okay but `[1,2,3]` and `[4,7,11]` would not? – mu is too short Jul 25 '14 at 23:06
  • This is a special case of [relational division](http://stackoverflow.com/questions/tagged/relational-division). – Erwin Brandstetter Jul 26 '14 at 03:54

2 Answers2

3

The query is not returning any result because you are specifying 2 opposite conditions: listing_options.option_id IN ('1','2') and listing_options.option_id IN ('4','5','7'). When you are joining tables, the where clause is applied individually for every combination of the joined tables.

To get the result that you want (all listings that have different options), you would either need to join the listing_options table several times, like this:

SELECT l.* FROM listings l
    INNER JOIN listing_options o1 ON l.id = o1.listing_id
    INNER JOIN listing_options o2 ON l.id = o2.listing_id
  WHERE l.state IS NULL
    AND o1.option_id IN ('1','2')
    AND o2.option_id IN ('4','5','7')

or alternatively you can use subselects, like this:

SELECT * FROM listings
  WHERE state IS NULL
    AND id IN (SELECT listing_id FROM listing_options WHERE option_id IN ('1','2'))
    AND id IN (SELECT listing_id FROM listing_options WHERE option_id IN ('4','5','7'))

As you can see, the second query is easier to generate with ActiveRecord, specially if your number of options is dynamic:

options = [[1,2], [4,5,7]]
query = Listing.where(state: nil)
options.each do |array|
  query = query.where('id IN (SELECT listing_id FROM listing_options WHERE option_id IN (?))', array)
end
query.load
rabusmar
  • 4,084
  • 1
  • 25
  • 29
1

I suggest an EXISTS semi-join wherein you join listing_options to itself (as many times as needed).

SELECT *
FROM   listings l
WHERE  state IS NULL
AND    EXISTS (
   SELECT 1
   FROM   listing_options o1
   JOIN   listing_options o2 USING (listing_id)
   WHERE  o1.listing_id = l.id
   AND    o1.option_id IN (1,2)
   AND    o2.option_id IN (4,5,7)
   );

You can only use multiple joins like demonstrated in @rabusmar's first query (or in "6) Sean" in the linked answer below) if there can be at most one matching option_id per group. Else you would multiply rows, which would require re-grouping. Expensive, uncalled work.

EXISTS is typically faster than IN, while also not exhibiting tricky behavior with NULL values.

We assembled an arsenal of techniques for relational division under this related question:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228