2

How to create a query which selects products of given features where feature statement is formed by "and" or "or" condition depending on a group they belong to?

Description of the situation

  1. There is a store with products.
  2. Products may have features or not.
  3. A customer looks for specific features of products which means filling out the form and sending an array of feature ids.
  4. In the database, each feature belongs to only one group of features.
  5. The first group (disjunction attribute is true, called "OR") allows to display the product if one of the features matches any feature submitted by the customer.
    Example: selecting shapes: circle, square, triangle displays products which are circles or squares or triangles.
  6. The second group (disjunction attribute is false, called "AND") allows to display the product only if the product has all of the features submitted by the customer.
    Example: selecting colors: red, green, blue displays products which are red and green and blue.

Test environment

http://sqlfiddle.com/#!12/f4db7

"OR" Query
It works except for those product which have no features.

SELECT product_id 
FROM product_features 
WHERE product_features.feature_id IN ( 
    SELECT feature_id FROM features 
    LEFT JOIN feature_groups 
        ON features.feature_group_id = feature_groups.feature_group_id 
    WHERE feature_id IN (11, 12, 13) AND feature_groups.disjunction = TRUE 
)
GROUP BY product_id

"AND" Query
This query cannot be used because the number of features where disjunction is false is not known.

SELECT product_id FROM product_features
WHERE feature_id IN (43, 53, 63)
GROUP BY product_id
HAVING COUNT(DISTINCT feature_id) = 3
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
rafis
  • 233
  • 2
  • 4
  • 10
  • 'It works except for those product which have no features.' - if they have no features shouldn't they be eliminated by the "OR" query? – PinnyM Sep 11 '13 at 16:27
  • A "product with no features" would *never* show up in either of your use cases according to your description. – Erwin Brandstetter Sep 11 '13 at 17:00
  • @PinnyM I should have used a different expression and describe it better. The query works fine when the customer selects at least one feature. Then indeed the "OR" query should eliminate the products with no features. If, however, the customer doesn't select any feature and looks for products, then all products should be shown. And in this case, using the "OR" query, doesn't work as needed. Please keep in mind that feature ids set in WHERE IN clause are not only "disjunction is true" type, the query filters them on the fly, so it may happen none of those ids can be used. – rafis Sep 12 '13 at 14:20

1 Answers1

1

"OR" case

Simpler and faster:

SELECT DISTINCT pf.product_id 
FROM   product_features    pf
LEFT   JOIN features       f  USING (feature_id)
LEFT   JOIN feature_groups fg USING (feature_group_id)
WHERE (f.feature_id = ANY (_my_arr) 
AND    fg.disjunction)
OR     _my_arr  = '{}';

... where _my_arr can be '{11, 12, 13}'::int[] or '{}'::int[]. If _my_arr would be NULL use _my_arr IS NULL instead.

  • Due to operator precedence AND binds before OR and parentheses are not required. They may improve readability, though.

  • DISTINCT or GROUP BY .. either is good here.

  • AND fg.disjunction .. since this is a boolean type, you can shorten the syntax.

  • JOINs are generally faster than another IN clause.

  • USING is just a notational shortcut that works with your (useful!) naming convention.

Or, even faster (for more than 1 feature) - and simpler to split cases:

SELECT product_id
FROM   products p
WHERE  EXISTS (
   SELECT 1
   FROM   product_features pf
   JOIN   features         f  USING (feature_id)
   JOIN   feature_groups   fg USING (feature_group_id)
   WHERE  pf.product_id = p.product_id
   AND    f.feature_id = ANY (_my_arr)
   AND    fg.disjunction
   )
OR     _my_arr  = '{}';

I would rather split the case in your app depending on input (with features / no features). Trivial to do with the second form.

"AND" case

This is a classical case of relational division. We have assembled a whole arsenal of query techniques to deal with it in under this related question:
How to filter SQL results in a has-many-through relation

Could be:

SELECT product_id
FROM   product_features p1
JOIN   product_features p2 USING (product_id)
JOIN   product_features p3 USING (product_id)
...
WHERE  p1.feature_id = 43
AND    p2.feature_id = 53
AND    p3.feature_id = 63
...

I am ignoring NOT feature_groups.disjunction in the example since it is not in the question either. Add it if you need it.
I would select valid feature_id before building the query.

-> SQLfiddle demo.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I have the impression that I didn't express myself clearly enough. Your "OR" query looks much nicer than mine and I do appreciate your hints but it still doesn't work as needed. The WHERE IN clause contains set of feature ids selected by the customer. If none of these features belongs to a group with "disjunction is true" attribute or the customer doesn't select any feature then the query returns no product but in such a case, it should return all products. – rafis Sep 12 '13 at 15:41
  • Perfect, "OR" query does the job. "AND" query seems to be very fast however a little hard to maintain, especially if there are hundreds of features. Besides, the set of mixed feature ids are sent all together in an array, so it is not known which feature id belongs to "OR" or "AND" query. These features which belong, in this case, to "disjunction is false" group must be picked somehow. – rafis Sep 16 '13 at 07:51
  • 1
    @rafis: I am answering your question, and in quite some detail I might add. Should be of interest to the general public. I am not doing all your work. If you bump into another obstacle, open a new question, clearly defining *one* problem. – Erwin Brandstetter Sep 16 '13 at 11:45