0

I have a table call production

factory_id | factory_name | product_id
         1 |            A |          1
         1 |            A |          2
         1 |            A |          3
         2 |            B |          3
         3 |            C |          1
         3 |            C |          2
         3 |            C |          3
         3 |            C |          4
         3 |            C |          5

I'm trying to develop a query that will return two factory name pair such that every product of factory1 is produced by factory2, result looked like:

factory_name_1 | factory_name_2
             A |              C
             B |              A
             B |              C

I have some nested self join and renames, but I can't wrap my head around how I can apply EXISTS or IN for this scenario that does "for each product produced by factory X do condition". Thanks to any help in advanced.

Update: Sorry that I forgot to paste my query:

select t0.fname0, t1.fname1
from    (
        select factory_id as fid0, factory_name as fname0, product_id as pid0, count(distinct factory_id, product_id) as pnum0
        from production
        group by factory_id
    ) t0
    join
    (
        select factory_id as fid1, factory_name as fname1, product_id as pid1, count(distinct factory_id, product_id) as pnum1
        from production
        group by factory_id
    ) t1
where t0.fid0 <> t1.fid1
    and t0.pnum0 < t1.pnum1
    and t0.pid0 = t1.pid1;

Update 2: production is the only table. Expected output factory1 and factory2 are just the rename of factory_name attribute.

Ray-Von-Mice
  • 429
  • 1
  • 4
  • 16
  • 1
    Show what you tried. We'll help you fix it, we won't write the whole thing for you. Try getting the count of all the matching products, and comparing that to the count of products in `factory_1`. – Barmar Mar 01 '20 at 06:41
  • 1
    what exactly is `factory1` and `factory2` in relation to the production table, and is this the only table provided or are there more? because in most cases one uses `EXISTS` and `IN` with another table for comparison. – de_classified Mar 01 '20 at 06:57
  • @FishingCode Yes production is the only table. factory1 and factory2 are just the rename of ```factory_name``` attribute. The reason I say ```EXISTS``` and ```IN``` is I think they are for " for each comparison " applied with subquery but I don't have a clue how to do that In this scenario. – Ray-Von-Mice Mar 01 '20 at 07:13
  • Please clarify via edits, not comments. Please in code questions give a [mre]--cut & paste & runnable code; example input (as initialization code) with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For errors that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL include DDL & tabular initialization. Isolate the first erroneous subexpression & its input & output. (Debugging fundamental.) – philipxy Mar 01 '20 at 20:30
  • "such that every product of factory1 is produced by factory2" This is relational division. This is a faq. Before considering posting please always google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. If you post a question, use one phrasing as title. See [ask] & the voting arrow mouseover texts. – philipxy Mar 01 '20 at 20:32
  • Does this answer your question? [check if a column contains ALL the values of another column - Mysql](https://stackoverflow.com/questions/28939367/check-if-a-column-contains-all-the-values-of-another-column-mysql) – philipxy Mar 01 '20 at 21:04

3 Answers3

2

You need to JOIN the table for each factory pairing to make sure they "join" on the same product_ids, otherwise you might end up with similar counts for DISTINCT product_ids but these will not necessarily refer to the same product_ids.

This is my take on it:

SELECT bfna,afna, pcnt FROM (
   SELECT a.factory_name afna, b.factory_name bfna, COUNT(DISTINCT b.product_id) commoncnt
   FROM tbl a LEFT JOIN tbl b ON b.factory_name!=a.factory_name AND b.product_id=a.product_id 
   GROUP BY a.factory_name, b.factory_name
) c
INNER JOIN (
   SELECT factory_name fna, COUNT(DISTINCT product_id) pcnt
   FROM TBL GROUP BY factory_name
) d ON fna=bfna AND commoncnt=pcnt
ORDER BY bfna,afna

You can find a demo here: https://rextester.com/JJGCK84904

It produces:

bfna    afna    commoncnt
A       C       3
B       A       1
B       C       1

For simplicity I left out the column factory_id as it does not add any information here.

Fun fact: as I am using only "bare-bone" SQL expressions, the above code will run on SQL-Server too without any changes.

Carsten Massmann
  • 26,510
  • 2
  • 22
  • 43
1

You can do it this way:

        select A as factory_name_1 , B as factory_name_2
        from
        (
            select A, B, count(*) as Count_
            from
            (
            select a.factory_name as A, b.factory_name as B 
            from yourtable a 
            inner join yourtable b
            on a.product_id = b.product_id and a.factory_id <> b.factory_id
            )a group by A, B
        )a 
        inner join 
       (select factory_name, count(*) as Count_ from yourtable group by factory_name) b 
       on a.A = b.factory_name and a.Count_ = b.Count_
       Order by 1

Output:

factory_name_1  factory_name_2
A                 C
B                 A
B                 C
FoggyDay
  • 11,962
  • 4
  • 34
  • 48
zip
  • 3,938
  • 2
  • 11
  • 19
0

The other solutions just seem more complicated than necessary. This is basically a self-join with aggregation:

with t as (
      select t.*, count(*) over (partition by factory_id) as cnt
      from tbl t
     )
select t1.factory_id, t2.factory_id, t1.factory_name, t2.factory_name, count(*)
from t t1 join
     t t2
     on t1.product_id = t2.product_id and t1.factory_id <> t2.factory_id
group by t1.factory_id, t2.factory_id, t1.factory_name, t2.factory_name, t1.cnt
having count(*) = max(t1.cnt);

Here is a db<>fiddle.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786