0

I have a table containing attributes with the following structure:

id: bigint unsigned autoincrement
product_id: bigint foreign key 
attribute_id: bigint foreign key
value:  varchar(100) 

I can query one criteria in the following fashion:

SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = ? AND value = ?

However I need to be able to find products that match multiple such criteria and would like to avoid multiple database queries for performance reasons. Simply adding more criteria with AND won't work since they will involve the same columns so for example:

SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = 1 AND value = 'Blue'
INTERSECT
SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = 2 AND value = '36'
INTERSECT
SELECT DISTINCT product_id FROM product_attributes WHERE attribute_id = 3 AND value = 'slim'

I have read about the INTERSECT statement which seems like it might work but I've read that MySQL doesn't support it, a search through MySQL 8 documentation produced no relevant result and the query above which I assume is correct produces an error on MySQL.

I've also read that something similar could be achieved with an inner join, but all the examples I've found involve multiple tables. There might also be an even better or simpler way to write the query that hasn't occurred to me. Or perhaps it's actually better to just send multiple queries and calculate the intersection outside of MySQL (though I would be very surprised) I appreciate greatly any help from anyone who has done something similar in the past.

Rick James
  • 135,179
  • 13
  • 127
  • 222
kaan_a
  • 3,503
  • 1
  • 28
  • 52
  • 1
    have you ever heard for `OR` operator? or you are just using `AND` – Zeljka Mar 05 '20 at 11:55
  • @Zeljka if only it were that simple. Using the OR operator will produce false positives. I would get products that match any of the criteria. So for example I would get products that are not blue but are slim or have a size of 36. Or I might get products that are blue but not slim and not size 36. – kaan_a Mar 05 '20 at 12:02
  • 1
    ok you are not using `OR` operator, try this`where (attribute_id = 1 AND value = 'Blue') OR (attribute_id = 2 AND value = '36') OR (attribute_id = 3 AND value = 'slim')` and dont use any intersect, just one query with this condition – Zeljka Mar 05 '20 at 12:03
  • 1
    EAV schema design is loaded with hassles. Condolences. – Rick James Mar 08 '20 at 23:13
  • This is a faq. Before considering posting please read the manual & google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. If you post a question, use one phrasing as title. See [ask] & the voting arrow mouseover texts. PS Your post does not contain a clear precise statement of the class of query that you are talking about. Until you do that you cannot effectively search. Also you are expecting us to guess from fragments & examples. – philipxy Mar 09 '20 at 00:21
  • Please in code questions give a [mre]--cut & paste & runnable code; example input (as initialization code) with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For errors that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL that includes DBMS & DDL, which includes constraints & indexes & tabular initialization. [ask] – philipxy Mar 09 '20 at 00:32
  • Does this answer your question? [Select values that meet different conditions on different rows?](https://stackoverflow.com/questions/477006/select-values-that-meet-different-conditions-on-different-rows) – philipxy Mar 09 '20 at 00:40

2 Answers2

2

You need to use aggregation to count the number of matching rows to the set of conditions and assert that it is equal to the number of conditions:

SELECT product_id
FROM product_attributes
WHERE (attribute_id, value) IN ((1, 'Blue'), (2, '36'), (3, 'slim'))
GROUP BY product_id
HAVING COUNT(*) = 3
Nick
  • 138,499
  • 22
  • 57
  • 95
  • I have to say this definitely works in my initial testing and looks brilliant. I'm going to do some benchmarks before I choose the accepted answer though. – kaan_a Mar 05 '20 at 12:13
  • 1
    @Kaan absolutely - you shouldn't accept anything until you've figured out what is actually best. Note that if you can have duplicates of the conditions (e.g. more than one row with `attribute_id = 1` and `value = 'Blue'`) you will need to use `COUNT(DISTINCT attribute_id, value)` in the `HAVING` clause. – Nick Mar 05 '20 at 12:15
  • do you mean multiple rows with the same product_id, attribute_id and value? Because I shouldn't have that. What would even be the point of that? I could however have rows with the same product_id and attribute_id but different value columns. – kaan_a Mar 05 '20 at 12:21
  • 1
    @Kaan yes, same product_id, attribute_id and value. As you say, what would be the point, but trust me, I've seen it... – Nick Mar 05 '20 at 12:23
  • I believe you :) Thanks for the heads up – kaan_a Mar 05 '20 at 12:55
2

This is the key / value store problem.

It's a slight pain in the neck to do what you want. Use JOIN operations to pivot the values into a row. Like this.

    SELECT p.product_id, 
           color.value AS color,
           size.value AS size,
           cut.value AS cut
      FROM ( SELECT DISTINCT product_id FROM product_attributes ) p
      LEFT JOIN product_attributes color ON color.product_id = p.product_id 
                                        AND color.attribute_id = 1
      LEFT JOIN product_attributes size  ON size.product_id = p.product_id 
                                        AND size.attribute_id = 2
      LEFT JOIN product_attributes cut   ON cut.product_id = p.product_id 
                                        AND cut.attribute_id = 3

This generates a resultset with one row per product/color/size/cut combination

Then you can filter that resultset like this

SELECT * 
  FROM (
      SELECT p.product_id, 
             color.value AS color,
             size.value AS size,
             cut.value AS cut
        FROM ( SELECT DISTINCT product_id FROM product_attributes ) p
        LEFT JOIN product_attributes color ON color.product_id = p.product_id 
                                          AND color.attribute_id = 1
        LEFT JOIN product_attributes size  ON size.product_id = p.product_id 
                                          AND size.attribute_id = 2
        LEFT JOIN product_attributes cut   ON cut.product_id = p.product_id 
                                          AND cut.attribute_id = 3
       ) combinations
 WHERE color='Blue' AND size='36' AND cut='slim'

MySQL's query planner is smart enough that this doesn't run as slowly as you might guess, given the proper indexes.

The FROM clause generates a comprehensive list of product ids, from your product_attributes table to join to the specific attributes. If you have some other table for products, use that instead of the SELECT DISTINCT....

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • 1
    Thank you, though you missed the commas after `AS color` and `AS size` in both queries. Otherwise this definitely works. I do think the answer from @Nick is more elegant but I am curious to know which is more performant and which causes more load, so I will run some benchmarks before picking one as correct – kaan_a Mar 05 '20 at 12:54
  • Thanks for the catch on the commas. I often create views from queries like mine for use by people less into SQL intricacies. @nick's query is probably a little faster than mine. – O. Jones Mar 05 '20 at 13:40