SQL query for selecting products with same ingredients of other products

Question

I have a database that stores products "available on the market" and products "still in development" in two separate tables (market_product and dev_product). A third table (substance) contains all substances a product can made of. Other two tables (marked_product_comp and dev_product_comp) mantains product compositions.

I want to select products still in development that are made of the same ingredients of marketed products.

In the following (simplified) example the query must select product with ID = 2 from dev_product table.

CREATE table market_product (ID SERIAL PRIMARY KEY);
CREATE table dev_product (ID SERIAL PRIMARY KEY);
CREATE table substance (ID SERIAL PRIMARY KEY);
CREATE table market_product_comp (prodID SERIAL, substID SERIAL, PRIMARY KEY(prodID,substID));
CREATE table dev_product_comp (devID SERIAL, substID SERIAL, PRIMARY KEY(devID,substID));

INSERT INTO market_product VALUES (1),(2);
INSERT INTO dev_product VALUES (1),(2);
INSERT INTO substance VALUES (1),(2),(3);
INSERT INTO market_product_comp VALUES (1,1),(1,2),(2,3);
INSERT INTO dev_product_comp VALUES (1,2),(2,1),(2,2);

How to write such query?

UPDATE:

Sorry, I haven't noticed I asked my question in an ambiguous way.

I want to select products still in development that have the same composition of at least one marketed product. For example, if there is a dev_product made by substances {1,2} and only one market_product made by substances {1,2,3}, I want to discard that dev_product, because it has a different composition. I hope this clarify.

What database? Products should've been one table, using a status column to differentiate... — OMG Ponies, Sep 08 '09 at 15:55
When products have different attributes according to their type, and this is the case, I don't think using separate tables is a bad idea. — Hobbes, Sep 08 '09 at 16:40

score 1 · Answer 1 · answered Sep 09 '09 at 05:01

1

Here's a solution that relies on the fact that COUNT() ignores NULLs.

SELECT d1.devId, m1.prodId
FROM market_product_comp m1
CROSS JOIN dev_product_comp d1
LEFT OUTER JOIN dev_product_comp d2 
   ON (d2.substId = m1.substId AND d1.devId = d2.devId)
LEFT OUTER JOIN market_product_comp m2 
   ON (d1.substId = m2.substId AND m1.prodId = m2.prodId)
GROUP BY d1.devId, m1.prodId
HAVING COUNT(d1.substId) = COUNT(d2.substId)
   AND COUNT(m1.substId) = COUNT(m2.substId);

I tested this on MySQL 5.0.75, but it's all ANSI standard SQL so it should work on any brand of SQL database.

answered Sep 09 '09 at 05:01

Bill Karwin

538,548
86
673
828

Bill, I like your solution too, especially because it is more informative (in the output I can see the matching products). I vote it up. But I can choose only one answer, and I choose the Quassnoi one, because his solution seems to run faster than yours, and it's more easy to understand (at least to me). – Hobbes Sep 09 '09 at 11:11
Thanks. No problem, I know the importance of understanding your code so it's maintainable! :-) – Bill Karwin Sep 09 '09 at 16:08

score 0 · Answer 2 · answered Sep 08 '09 at 15:18

0

select d.* from dev_product d
 left join dev_product_comp dpc on d.Id = dpc.devId
where dpc.substID in 
  (select mpc.substID from market_product_comp  mpc 
    left join market_product mp on mp.Id = mpc.prodId)

answered Sep 08 '09 at 15:18

Gregoire

24,219
6
46
73

score 0 · Answer 3 · answered Sep 08 '09 at 15:25

Select only dev product ids where all the products substances are used in market products.

select 
   dp.id
from 
   dev_product dp
   inner join dev_product_comp dpc on dp.id = dpc.devid
where 
   dpc.substid in (select substid from market_product_comp) 
group by 
   dp.id
having 
   count() = (select count() from dev_product_comp where devid = dp.id)

Excludes products with ANY ingredients not used in production.

Quassnoi · Accepted Answer · 2009-09-09T00:25:32.607

In MySQL:

SELECT  *
FROM    dev_product dp
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    market_product mp
        WHERE   NOT EXISTS
                (
                SELECT  NULL
                FROM    dev_product_comp dpc
                WHERE   dpc.prodID = dp.id
                        AND NOT EXISTS
                        (
                        SELECT  NULL
                        FROM    market_product_comp mpc
                        WHERE   mpc.prodID = mp.id
                                AND mpc.substID = dpc.substID
                        )
                )
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    market_product_comp mpc
                WHERE   mpc.prodID = mp.id
                        AND NOT EXISTS
                        (
                        SELECT  NULL
                        FROM    dev_product_comp dpc
                        WHERE   dpc.prodID = dp.id
                                AND dpc.substID = mpc.substID
                        )
                )

        )

In PostgreSQL:

SELECT  *
FROM    dev_product dp
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    market_product mp
        WHERE   NOT EXISTS
            (
            SELECT  NULL
            FROM    (
                SELECT  substID
                FROM    market_product_comp mpc
                WHERE   mpc.prodID = mp.ID
                ) m
            FULL OUTER JOIN
                (
                SELECT  substID
                FROM    dev_product_comp dpc
                WHERE   dpc.devID = dp.ID
                ) d
            ON  d.substID = m.substID
            WHERE   d.substID IS NULL OR m.substID IS NULL
            )
        )

Neither from these queries uses COUNT(*): it's enough to find but a single non-matching component to stop evaluating the whole pair.

See these entries in my blog for explanations:

Matching whole sets (PostgreSQL, with FULL OUTER JOIN)
MySQL: Matching whole sets (MySQL, with EXISTS)

SQL query for selecting products with same ingredients of other products

4 Answers4