1

I am trying to find an efficient query to find all matching objects in a "join" table.

Given an object Adopter that has many Pets, and Pets that have many Adopters through a AdopterPets join table. How could I find all of the Adopters that have the same Pets?

The schema is fairly normalized and looks like this.

TABLE Adopter
  INTEGER id

TABLE AdopterPets
  INTEGER adopter_id
  INTEGER pet_id

TABLE Pets
  INTEGER id

Right now the solution I am using loops through all Adopters and asks for their pets anytime it we have a match store it away and can use it later, but I am sure there has to be a better way using SQL.

One SQL solution I looked at was GROUP BY but it did not seem to be the right trick for this problem.

EDIT

To explain a little more of what I am looking for I will try to give an example.

+---------+ +------------------+ +------+
| Adptors | | AdptorsPets      | | Pets |
|---------| +----------+-------+ |------|
| 1       | |adptor_id | pet_id| | 1    |
| 2       | +------------------+ | 2    |
| 3       | |1         | 1     | | 3    |
+---------+ |2         | 1     | +------+
            |1         | 2     |
            |3         | 1     |
            |3         | 2     |
            |2         | 3     |
            +------------------+

When you asked the Adopter with the id of 1 for any other Adopters that have the same Pets you would be retured id 3.

If you asked the same question for the Adopter with the id of 3 you would get id 1.

If you asked again the same question of the Adopter with id 2` you would be returned nothing.

I hope this helps clear things up!

Community
  • 1
  • 1
scommette
  • 41
  • 1
  • 6
  • What do you mean by "all of the `Adopters` that have the same `Pets`"? The same as *what*? Also, [what have you tried?](http://whathaveyoutried.com) –  Apr 15 '13 at 04:09
  • 1
    She pretty clearly says what she tried below the code block... – austinbv Apr 15 '13 at 04:13
  • `Group by` might work if you could, before applying the `group by`, *uniquely* aggregate all the pets adopted by adopter into a single row ... like `(adopterid,'firstadoptedpetid.secondadoptedpetid.etc')` with the `.` being whatever character you might feel like using as a conjoiner. I didn't say it'd be efficient, but I'm pretty sure something like that could work. – JayC Apr 15 '13 at 04:26
  • Your second edit makes your situation looks like you want the application application of relational algebra division (http://en.wikipedia.org/wiki/Relational_algebra#Division_.28.C3.B7.29). IIRC, those queries are certainly write-able, but aren't terribly easy to write, either. I'm tempted to work on it, but sleep calls. – JayC Apr 15 '13 at 04:34
  • I'm still not completely sure on what you're looking for. With your example, if you ask question for adopter with id 2 (I assume `adopter_id`?) you should not get nothing but would get 1 **and** 3: both of these adopters also have pet_id 1. – marcj Apr 15 '13 at 12:29

4 Answers4

1

I'm not sure if this is exactly what you're looking for but this might give you some ideas.

First I created some sample data:

create table adopter (id serial not null primary key, name varchar );
insert into adopter (name) values ('Bob'), ('Sally'), ('John');

create table pets (id serial not null primary key, kind varchar);
insert into pets (kind) values ('Dog'), ('Cat'), ('Rabbit'), ('Snake');

create table adopterpets (adopter_id integer, pet_id integer);
insert into adopterpets values (1, 1), (1, 2), (2, 1), (2,3), (2,4), (3, 1), (3,3);

Next I ran this query:

SELECT p.kind, array_agg(a.name) AS adopters
FROM pets p
JOIN adopterpets ap ON ap.pet_id = p.id
JOIN adopter a ON a.id = ap.adopter_id
GROUP BY p.kind
HAVING count(*) > 1
ORDER BY kind;

  kind  |     adopters     
--------+------------------
 Dog    | {Bob,Sally,John}
 Rabbit | {Sally,John}
(2 rows)

In this example, for each pet I'm creating an array of all owners. The HAVING count(*) > 1 clause ensures we only show pets with shared owners (more than 1). If we leave this out we'll include pets that don't share owners.

UPDATE

@scommette: Glad you've got it working! I've refactored your working example a little bit below to:

  • use @> operator. This checks if one array contains the other avoids need to explicitly set order
  • moved the grouped_pets subquery to a CTE. This isn't only solution but neatly allows you to both filter out the current_adopter_id and get the pets for that id

You might find it helpful to wrap this in a function.

WITH grouped_pets AS (
  SELECT adopter_id, array_agg(pet_id ORDER BY pet_id) AS pets
  FROM adopters_pets
  GROUP BY adopter_id
)
SELECT * FROM grouped_pets
WHERE adopter_id <> 3
  AND pets @> (
    SELECT pets FROM grouped_pets WHERE adopter_id = 3
  );
marcj
  • 336
  • 1
  • 5
1

Thank you all for the help, I used a combination of a few things:

  SELECT adopter_id
  FROM (
    SELECT adopter_id, array_agg(pet_id ORDER BY pet_id)
    AS pets
    FROM adopters_pets
    GROUP BY adopter_id
  ) AS grouped_pets
  WHERE pets = array[1,2,3]  #array must be ordered
  AND adopter_id <> current_adopter_id;

In the subquery I get pet_ids grouped by their adopter. The ordering of the pet_ids is key so that the results in the main query will not be order dependent.

In the main query I compare the results of the subquery to the pet ids of the adopter I am looking to match. For the purpose of this answer the pet_ids of the particular adopter are represented by [1,2,3]. I then make sure that that the adopter I am comparing to is not included in the results.

Let me know if anyone sees any optimizations or if there is a way to compare arrays where order does not matter.

scommette
  • 41
  • 1
  • 6
  • good stuff! I edited my answer to add a way to deal with the array order. It could probably be refactored even more and hopefully this will help. – marcj Apr 15 '13 at 16:57
0

If you're using Oracle then wm_concat could be useful here

select pet_id, wm_concat(adopter_id) adopters
from AdopterPets
group by pet_id ;
roblogic
  • 1,266
  • 2
  • 13
  • 23
  • Sorry I am using Postgres, I figured the tag would be enough. I will see if there is something similar in PG – scommette Apr 15 '13 at 04:33
  • 1
    Save you the trouble: http://stackoverflow.com/questions/2560946/postgresql-group-concat-equivalent – JayC Apr 15 '13 at 04:38
0
--
-- Relational division 1.0
-- Show all people who own *exactly* the same (non-empty) set
-- of animals as I do.
--

        -- Test data
CREATE TABLE adopter (id INTEGER NOT NULL primary key, fname varchar );
INSERT INTO adopter (id,fname) VALUES (1,'Bob'), (2,'Alice'), (3,'Chris');

CREATE TABLE pets (id INTEGER NOT NULL primary key, kind varchar);
INSERT INTO pets (id,kind) VALUES (1,'Dog'), (2,'Cat'), (3,'Pig');

CREATE TABLE adopterpets (adopter_id integer REFERENCES adopter(id)
        , pet_id integer REFERENCES pets(id)
        );
INSERT INTO adopterpets (adopter_id,pet_id) VALUES (1, 1), (1, 2), (2, 1), (2,3), (3,1), (3,2);

        -- Show it to the world
SELECT ap.adopter_id, ap.pet_id
        , a.fname, p.kind
FROM adopterpets ap
JOIN adopter a ON a.id = ap.adopter_id
JOIN pets p ON p.id = ap.pet_id
ORDER BY ap.adopter_id,ap.pet_id;
SELECT DISTINCT other.fname AS same_as_me
FROM adopter other
        -- moi has *at least* one same kind of animal as toi
WHERE EXISTS (
        SELECT * FROM adopterpets moi
        JOIN adopterpets toi ON moi.pet_id = toi.pet_id
        WHERE toi.adopter_id = other.id
        AND moi.adopter_id <> toi.adopter_id
                -- C'est moi!
        AND moi.adopter_id = 1 -- 'Bob'
        -- But moi should not own an animal that toi doesn't have
        AND NOT EXISTS (
                SELECT * FROM adopterpets lnx
                WHERE lnx.adopter_id = moi.adopter_id
                AND NOT EXISTS (
                        SELECT *
                        FROM adopterpets lnx2
                        WHERE lnx2.adopter_id = toi.adopter_id
                        AND lnx2.pet_id = lnx.pet_id
                        )
                )
        -- ... And toi should not own an animal that moi doesn't have
        AND NOT EXISTS (
                SELECT * FROM adopterpets rnx
                WHERE rnx.adopter_id = toi.adopter_id
                AND NOT EXISTS (
                        SELECT *
                        FROM adopterpets rnx2
                        WHERE rnx2.adopter_id = moi.adopter_id
                        AND rnx2.pet_id = rnx.pet_id
                        )
                )
        )
        ;

Result:

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "adopter_pkey" for table "adopter"
CREATE TABLE
INSERT 0 3
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "pets_pkey" for table "pets"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 6
 adopter_id | pet_id | fname | kind 
------------+--------+-------+------
          1 |      1 | Bob   | Dog
          1 |      2 | Bob   | Cat
          2 |      1 | Alice | Dog
          2 |      3 | Alice | Pig
          3 |      1 | Chris | Dog
          3 |      2 | Chris | Cat
(6 rows)

 same_as_me 
------------
 Chris
(1 row)
joop
  • 4,330
  • 1
  • 15
  • 26