1

I have a table called Lives:

create table Lives(
    animal varchar(10) not null,
    year int not null,
    zoo varchar(10),
    primary key(animal,year)
);

With the given data:

ANIMAL     YEAR        ZOO
---------- ----------- ----------
joe               2000 a
joe               2001 a
joe               2002 d
joe               2004 c
fred              2002 b
fred              2004 c
jane              2000 a
jane              2001 b
jane              2002 b
jack              2000 a
jack              2001 a
jack              2002 d
jack              2004 c
judy              2004 b
ruby              2003 d
alfred            2006 a

It consists of the name of the animal, a year and the zoo it was in that year.

I need a query that finds the pairs (a, b) of animals that have always been in the same zoo throughout all years, and such that a is lexicographically smaller than b (i.e., a < b). More precisely, such pairs (a, b) satisfy the following condition: if animal a lives in zoo z during year y then b also lives in zoo z during year y, and vice versa.

So the output for my example data would be:

Animal  Animal
------- -------
jack    joe

Up until now I've constructed this query:

SELECT l1.animal, l2.animal
FROM Lives as l1, Lives as l2 
WHERE l2.year = l1.year and l1.animal > l2.animal

It gives me animals that have been in the zoo for 1 year. I don't now how to continue from this.

I'm going to use this query in my sqlj program. Is it possible to construct a query that satisfies my desired result, or should I go on from my current query and implement the rest in sqlj?

Valerie94
  • 303
  • 2
  • 9
  • what is your expected O/P – Ankit Bajpai Oct 28 '14 at 13:49
  • @AnkitBajpai I've added expected output in my question. – Valerie94 Oct 28 '14 at 13:52
  • Why do you want to use SQLj? you can use normal JDBC? SQLj is for static SQL, and it needs extra steps to compile your Java classes. For a normal query, you probably do not need that. What you probably mean is if you can retrieve what you want with a single query OR you need to do an extra process in your applications. – AngocA Oct 28 '14 at 14:03
  • @AngocA I do need SQLJ as it is part of the task I have to make. Most of the programs just consist of a single query and iterating through the results. I just added it into my question because maybe a single query to get my result is not possible and I would be forced to do some java operations on the result of some query. – Valerie94 Oct 28 '14 at 14:10
  • @jpw You are right. I was so happy to get an answer that worked on my test data, so I accepted it too early I guess. Thanks for your answer, looks really good! – Valerie94 Oct 28 '14 at 16:02
  • @ValerieV... For further discussion with @jpw, can you tell us what result do you want to get if add `['jane',2004, 'b']` and `['joe', 2005, 'c']` to the table? – Jaugar Chang Oct 29 '14 at 04:49

2 Answers2

2

I think what you want is an exact relational division, that returns all pairs such that one pair can not have any zoo or year that the other does not have.

A common way to do this is to use a double negation through two correlated sub-queries. It's a bit hard to follow but it should give you the correct result.

-- select all distinct pairs such that...
SELECT * FROM (
    SELECT a.animal AS animal1, b.animal AS animal2
    FROM lives a
    INNER JOIN lives b ON a.zoo = b.zoo AND a.year = b.year AND a.animal < b.animal 
) animals
WHERE NOT EXISTS (
       -- there does not exist any animal that is not...
        SELECT * FROM lives b
        WHERE b.animal = animals.animal2
        AND NOT EXISTS (
             -- in the set of animals that share year and zoo
                SELECT * FROM lives c
                WHERE c.animal = animals.animal1
                AND c.zoo = b.zoo AND c.year = c.year
                )
        )
GROUP BY animals.animal1, animals.animal2

Using a simple count to determine equality won't work as you'll get a match when the count is the same even though one animal has more zoos than the other. To verify this add this row:

ANIMAL  YEAR  ZOO
jane    2004  b

And the result you get from the accepted answer will be:

animal  animal
jane    jack
joe     jack
joe     jane

Whereas my solution gives:

animal1 animal2
jack    joe
jpw
  • 44,361
  • 6
  • 66
  • 86
  • Your solution will get [joe,jack] and [judy,jane] now, you could change select statment to `SELECT b.animal AS animal1, a.animal AS animal2` instead. – Jaugar Chang Oct 29 '14 at 02:55
  • @JaugarChang Yeah, I needed to make a corresponding change; missed that. Thanks. – jpw Oct 29 '14 at 02:58
1

Try this:

with years as
(
select 
    animal
   ,count(distinct year) as years
from lives
group by animal
)
select 
    t1.animal as animal1
   ,t2.animal as animal2
   --,t1.year as y1
   --,t2.year as y2
   --,t1.zoo as z1
   --,t2.zoo as z2
from 
    lives t1
left outer join
    lives t2
on
    t1.year=t2.year and t1.zoo=t2.zoo and t1.animal > t2.animal
left outer join
    years
on
    years.animal=t1.animal
group by
    t1.animal 
   ,t2.animal 
having
    count(distinct t2.year)=max(years.years)

SQL FIDDLE DEMO

Jaugar Chang
  • 3,176
  • 2
  • 13
  • 23
  • Try adding a row for [`jane,2004,b`] and you'll get a match for the pair`[judy,jane]`even though the relation is not symmetric. – jpw Oct 28 '14 at 15:50
  • No, it wouldn't because the join condition include zoo, so judy don't have relation in 2004 with jane. But I really love your solution, I try not exists first but failed. – Jaugar Chang Oct 28 '14 at 15:56
  • Yes it would, and did when I tried it, but if you change `t1.animal > t2.animal`to`t1.animal < t2.animal`it works. Turns out my solution had the same error... – jpw Oct 28 '14 at 16:02
  • @jpw I don't think that's an error, if using `t1.animal > t2.animal`(for OP's original code) will get `[joe,jack]`, that means the lexicographically small animal jack in same zoo through all 4 years with joe. If add `[jane,2004,b]` to table, will get `[joe,jack]` and `[judy,jane]`, they also match the conditions(jane in same zoo througt all years(only 2004) with judy). Change `t1.animal > t2.animal` to `t1.animal < t2.animal` get difference meaning of conditions, so will get `[jack,joe]` instead, `[jane,judy]` will be omitted because judy not in same zoo through all 4 years with jane. – Jaugar Chang Oct 29 '14 at 02:29
  • Your fiddle with the added row for [jane,2004,b]: http://sqlfiddle.com/#!15/84352/1 when using the `t1 > t2` order the maxyears will be judys years and the distinct count of janes years will be 1 for the only shared year, or at least I think that's what is happening. Changing the order to `t1 < t2` makes it right (or so it would seem). – jpw Oct 29 '14 at 03:07
  • In my approach, `t1>t2` or `t1t2` only get `[judy,jane]` pair. But your approach will get `[jack,joe]` and `[judy,jane]` both with `t1>t2`. see http://sqlfiddle.com/#!6/65640/3 – Jaugar Chang Oct 29 '14 at 03:36
  • This is interesting, I'm starting to believe that maybe both our approaches might have issues, pretty sure mine has some at least; going to think more about it tomorrow :) It's an interesting problem. – jpw Oct 29 '14 at 03:47