Matching multiple attributes

Question

I have 3 grocery stores which sell a (possibly) different selection of fruit. I want to know which stores sell the same fruit and what that fruit is.
My table is:

ID | Fruit1  | Fruit2   | Fruit3   | Fruit4
---+---------+----------+----------+--------------
1  | apples  | bananas  |          |
2  | apples  | oranges  | cherries | lychees
3  | bananas | cherries | lychees  |

so my output should look like:

ID1 | ID2 | Fruit
----+-----+--------
1   | 2   | apples
1   | 3   | bananas    
2   | 3   | cherries
2   | 3   | lychees

Can you show us what you have tried so far? What errors are you getting? — Bob, Apr 04 '15 at 02:53
what database are you using? (oracle, sql server, mysql, etc.) — Brian DeMilia, Apr 04 '15 at 03:17

Schwern · Answer 1 · 2015-04-04T05:00:33.510

Your schema is not well designed for a relational database. Relational databases don't have lists, they have relations. If you want a thing (a store) to have many things (fruit) you don't make lots of columns, you make lots of rows. It's a bit odd to wrap your head around.

Your schema should be like this. I'm using MySQL syntax.

CREATE TABLE stores (
    id   INTEGER PRIMARY KEY AUTO_INCREMENT,
    name TEXT
);

CREATE TABLE fruit_sold (
    store_id INTEGER REFERENCES stores(id),
    fruit    TEXT
);

For every fruit a store sells, it will have a row in fruit_sold. Your data would look like this...

INSERT INTO stores (name)
VALUES ("Fruit Stand"), ("The Pits"), ("Fruit R U");

SELECT * FROM stores;
+----+-------------+
| id | name        |
+----+-------------+
|  1 | Fruit Stand |
|  2 | The Pits    |
|  3 | Fruit R U   |
+----+-------------+

INSERT INTO fruit_sold (store_id, fruit)
VALUES (1, "apples"), (1, "bananas");
INSERT INTO fruit_sold (store_id, fruit)
VALUES (2, "apples"), (2, "oranges"), (2, "cherries"), (2, "lychees");
INSERT INTO fruit_sold (store_id, fruit)
VALUES (3, "bananas"), (3, "cherries"), (3, "lychees");

SELECT * FROM fruit_sold;
+----------+----------+
| store_id | fruit    |
+----------+----------+
|        1 | apples   |
|        1 | bananas  |
|        2 | apples   |
|        2 | oranges  |
|        2 | cherries |
|        2 | lychees  |
|        3 | bananas  |
|        3 | cherries |
|        3 | lychees  |
+----------+----------+

To get names of stores and what fruits they sell, you do a join.

SELECT s.name, f.fruit
FROM stores s
JOIN fruit_sold f ON s.id = f.store_id;
+-------------+----------+
| name        | fruit    |
+-------------+----------+
| Fruit Stand | apples   |
| Fruit Stand | bananas  |
| The Pits    | apples   |
| The Pits    | oranges  |
| The Pits    | cherries |
| The Pits    | lychees  |
| Fruit R U   | bananas  |
| Fruit R U   | cherries |
| Fruit R U   | lychees  |
+-------------+----------+

Again, you work with lists in SQL in rows, not columns. To get all the stores which sell the same fruit, simply order the list by fruit first, and then by store name.

SELECT s.name, f.fruit
FROM stores s
JOIN fruit_sold f ON s.id = f.store_id
ORDER BY fruit, s.name;
+-------------+----------+
| name        | fruit    |
+-------------+----------+
| Fruit Stand | apples   |
| The Pits    | apples   |
| Fruit R U   | bananas  |
| Fruit Stand | bananas  |
| Fruit R U   | cherries |
| The Pits    | cherries |
| Fruit R U   | lychees  |
| The Pits    | lychees  |
| The Pits    | oranges  |
+-------------+----------+

But you want only stores which sell the same fruit. To do that you need to compare a table with itself. That's a self join. You do a self join by adding join on an existing table where the IDs don't equal. JOIN fruit_sold f2 ON f.store_id <> f2.store_id. Then you can check for stores that sell the same fruit with WHERE f.fruit = f2.fruit.

SELECT s.name, f.fruit
FROM stores s
JOIN fruit_sold f ON s.id = f.store_id
JOIN fruit_sold f2 ON f.store_id <> f2.store_id
WHERE f.fruit = f2.fruit
ORDER BY f.fruit, s.name;
+-------------+----------+
| name        | fruit    |
+-------------+----------+
| Fruit Stand | apples   |
| The Pits    | apples   |
| Fruit R U   | bananas  |
| Fruit Stand | bananas  |
| Fruit R U   | cherries |
| The Pits    | cherries |
| Fruit R U   | lychees  |
| The Pits    | lychees  |
+-------------+----------+

Some databases do have lists, like Postgres. You shouldn't rely on that and should learn proper relational design. Or use a NoSQL database which has the normal data structures most humans expect.

score 0 · Answer 2 · answered Apr 04 '15 at 06:31

I add an answer in SQL Server or TSQL; according that is not specified by a tag or ... :

I think you need to list all fruits by their special Id, So I use Union All for that list.
Now you need to use that list by creating a new View or use a With statement, that I choose With statement.
The simplest way that I can suggest you is using a join by a special filter of ID1 < ID2 when join is on Fruit1 = Fruit2:

with All_Sales (Id, Fruit) 
As (
select Id, Fruit1 as Fruit from Sales where not Fruit1 is null
union All
select Id, Fruit2 as Fruit from Sales where not Fruit2 is null
union all 
select Id, Fruit3 as Fruit from Sales where not Fruit3 is null
union all 
select Id, Fruit4 as Fruit from Sales where not Fruit4 is null
)

select as1.Id as Id1, as2.Id as Id2, as1.Fruit 
from All_Sales as as1 inner join All_Sales as as2 on as1.Fruit = as2.fruit And as1.Id < as2.Id

Matching multiple attributes

2 Answers2