1

I have three tables offers, sports and the join table offers_sports.

class Offer < ActiveRecord::Base
  has_and_belongs_to_many :sports
end

class Sport < ActiveRecord::Base
  has_and_belongs_to_many :offers
end

I want to select offers that include a given array of sport names. They must contain all of the sports but may have more.

Lets say I have these three offers:

light:
  - "Yoga"
  - "Bodyboarding"
medium:
  - "Yoga"
  - "Bodyboarding"
  - "Surfing"
all:
  - "Yoga"
  - "Bodyboarding"
  - "Surfing"
  - "Parasailing"
  - "Skydiving"

Given the array ["Bodyboarding", "Surfing"] I would want to get medium and all but not light.

I have tried something along the lines of this answer but I get zero rows in the result:

Offer.joins(:sports)
     .where(sports: { name: ["Bodyboarding", "Surfing"] })
     .group("sports.name")
     .having("COUNT(distinct sports.name) = 2")

Translated to SQL:

SELECT "offers".* 
FROM "offers" 
INNER JOIN "offers_sports" ON "offers_sports"."offer_id" = "offers"."id"     
INNER JOIN "sports" ON "sports"."id" = "offers_sports"."sport_id" 
  WHERE "sports"."name" IN ('Bodyboarding', 'Surfing') 
GROUP BY sports.name 
HAVING COUNT(distinct sports.name) = 2;

An ActiveRecord answer would be nice but I'll settle for just SQL, preferably Postgres compatible.

Data:

offers
======================
id | name
----------------------
1  | light
2  | medium
3  | all
4  | extreme

sports
======================
id | name
----------------------
1  | "Yoga"
2  | "Bodyboarding"
3  | "Surfing"
4  | "Parasailing"
5  | "Skydiving"

offers_sports
======================
offer_id | sport_id
----------------------
1        | 1
1        | 2
2        | 1
2        | 2
2        | 3
3        | 1
3        | 2
3        | 3
3        | 4
3        | 5
4        | 3
4        | 4
4        | 5
Community
  • 1
  • 1
max
  • 96,212
  • 14
  • 104
  • 165
  • Does the SQL work? It looks correct. – Gordon Linoff Mar 21 '16 at 13:33
  • It returns 0 rows. @GordonLinoff – max Mar 21 '16 at 13:44
  • You're almost there: group by the "offers" fields (replace the * by explicit field names and repeat those in the group by). You're now grouping by sports.name and naturally, the distinct count of sports.name per sports.name is always 1. – Henk Kok Mar 21 '16 at 13:50
  • @HenkKok still gives 0 rows: `SELECT offers.name, offers.id, sports.id FROM "offers" INNER JOIN "offers_sports" ON "offers_sports"."offer_id" = "offers"."id" INNER JOIN "sports" ON "sports"."id" = "offers_sports"."sport_id" WHERE "sports"."name" IN ('Bodyboarding', 'Surfing') GROUP BY offers.name, offers.id, sports.name HAVING COUNT(distinct sports.name) = 2` – max Mar 21 '16 at 13:54
  • That's because you still included sports.name in the group by, that column has to go from the group by; also, sports.id must be removed from the SELECT clause. If you need the offers plus the sports in the final result, first do the select on the offers (that are linked to all 2 sports) and then use that select as an inline view to join to sports. – Henk Kok Mar 21 '16 at 13:57

2 Answers2

2

Group by offer.id, not by sports.name (or sports.id):

SELECT o.*
FROM   sports        s
JOIN   offers_sports os ON os.sport_id = s.id
JOIN   offers        o  ON os.offer_id = o.id
WHERE  s.name IN ('Bodyboarding', 'Surfing') 
GROUP  BY o.id  -- !!
HAVING count(*) = 2;

Assuming the typical implementation:

  • offer.id and sports.id are defined as primary key.
  • sports.name is defined unique.
  • (sport_id, offer_id) in offers_sports is defined unique (or PK).

You don't need DISTINCT in the count. And count(*) is even a bit cheaper, yet.

Related answer with an arsenal of possible techniques:


Added by @max (the OP) - this is the above query rolled into ActiveRecord:

class Offer < ActiveRecord::Base
  has_and_belongs_to_many :sports
  def self.includes_sports(*sport_names)
    joins(:sports)
      .where(sports: { name: sport_names })
      .group('offers.id')
      .having("count(*) = ?", sport_names.size)
  end
end
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
2

One way to do it is using arrays and the array_agg aggregate function.

SELECT "offers".*, array_agg("sports"."name") as spnames 
FROM "offers" 
INNER JOIN "offers_sports" ON "offers_sports"."offer_id" = "offers"."id"     
INNER JOIN "sports" ON "sports"."id" = "offers_sports"."sport_id" 
GROUP BY "offers"."id" HAVING array_agg("sports"."name")::text[] @> ARRAY['Bodyboarding','Surfing']::text[];

returns:

 id |  name  |                      spnames                      
----+--------+---------------------------------------------------
  2 | medium | {Yoga,Bodyboarding,Surfing}
  3 | all    | {Yoga,Bodyboarding,Surfing,Parasailing,Skydiving}
(2 rows)

The @> operator means that the array on the left must contain all the elements from the one on the right, but may contain more. The spnames column is just for show, but you can remove it safely.

There are two things you must be very mindful of with this.

  1. Even with Postgres 9.4 (I haven't tried 9.5 yet) type conversion for comparing arrays is sloppy and often errors out, telling you it can't find a way to convert them to comparable values, so as you can see in the example I've manually cast both sides using ::text[].

  2. I have no idea what the level of support for array parameters is Ruby, nor the RoR framework, so you may end-up having to manually escape the strings (if input by user) and form the array using the ARRAY[] syntax.

coladict
  • 4,799
  • 1
  • 16
  • 27
  • It does work and its not a bad answer - however Erwin's was much easier to integrate. You would basically need to generate the having clause as a string and bind the values to escape it. Not really hard but its still tricky to get the association loading right on this one. Thanks! – max Mar 21 '16 at 18:51