3

I have this table in my database in oracle9i:

CREATE TABLE involved_in
(
    rid number,
    aid number NOT NULL,
    fid number NOT NULL,
    role varchar(80),
    note varchar(80), 
    job varchar(35),
    PRIMARY KEY(rid),
    FOREIGN KEY(fid) REFERENCES production(pr_id),
    FOREIGN KEY(aid) REFERENCES person(pid)
);

Which contains data about actors (aid) who worked in films (fid).

What I want to do is similar to working out the Bacon number

Except my kevin bacon is known for having the aid 517635.

I have no clue as to how to work out (only using SQL statements) the number of actors I have to connect a given actor (by another aid) with to find a connection to 517635.

The result of the query would be either a list of all the actors the given actor has to connect to get to my guy or just a number.

For that I thought I'd have to get first all the actors 517635 has worked with and those would have a 1 as a result for having worked directly with him. The table is not too big but big enough to make that inviable.

Examples, let's say Brad Pitt is my 517635. He worked with Angelina Jolie in Mr. And Mrs Smith, that would make her number 1. If Brad Pitt has never worked in any movie with Bruce Willis (let's say that's the case) but Angelina Jolie has been in one with him then Bruce Willis' number with respect to Brad Pitt would be 2.

In my query if the given number was Angelina's the result would be: "Brad Pitt 1" or simply "1" If the given number was Willis' the result would be: "Angelina Jolie Brad Pitt 2" or "2" Example of what is in the table:

INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(1, 33,                  1584953, 'Himself', 'NULL', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(2, 1135, 1999660, 'Himself', 'NULL', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(3, 1135, 2465724, 'Himself', 'NULL', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(4, 6003, 2387806, 'Himself', '(archive footage)', 'actor');
INSERT INTO involved_in(rid, aid, fid, role, note, job) VALUES(5, 13011, 1935123, 'Himself', 'NULL', 'actor');

I have nothing in my mind, I'm completely new to SQL and all I can think of leads to infinite loops with a variable to count the number of loops. Any ideas as to where to begin and, with luck, end?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
mjgalindo
  • 856
  • 11
  • 26

2 Answers2

2

This is difficult to verify without sample data, but the following answer is, at a minimum syntactically correct.

What you want here is clearly a recursive query. There are to ways to do this in Oracle: using common-table expressions (CTE) (i.e. a with clause); or using the connect by clause. CTEs are SQL standard and connect by is proprietary, but I find connect by less confusing, so that's what I'll use (also recursive CTEs are not supported in 9i).

SELECT   aid, MIN (lvl) AS distance
FROM     (SELECT     ii2.aid, LEVEL AS lvl
          FROM       involved_in ii1
                     JOIN involved_in ii2
                        ON ii1.fid = ii2.fid AND ii1.aid <> ii2.aid
          START WITH ii1.aid = 517635
          CONNECT BY NOCYCLE ii1.aid = PRIOR ii2.aid)
GROUP BY aid
  • The join connects all aid that share a fid to each other.
  • The connect by clause joins each set of connected aid to another set of aid.
  • The nocycle clause prevents infinite recursion.
  • level is keyword that gives us the number of times the recursion has occurred. We have to get the minimum because any given aid could connect to the starting aid via multiple paths.
  • If this query performs very badly, you may want to only get the results within a certain distance. The best way to do this would be to add and level <= 100 to the connect by clause (in this case, limiting the results to a distance of 100 or less).

As pointed out in the comments, 9i doesn't support nocycle. In addition, the OP is running out of temp space when running this query. The two are actually unrelated: if a cycle occurs, Oracle will throw an error; this implies that the server is running out of temp space before a cycle is found. I don't see any good way around either of these issues.

It is possible to specify an end point (to some degree). You can add AND ii1.aid <> 2 where 2 is the aid of the end-point to the on clause. This will cause the query to stop navigating a branch when it encounters that value. This may not help with the above issues however, as it will only short-circuit those branches where the provided values are found. It still has to evaluate all of the other branches in case the value is in them somewhere.

Allan
  • 17,141
  • 4
  • 52
  • 69
  • After trying to understand the syntax which now I think I do I get an error at line 7 invalid relational operator at "ii1.aid =" which makes me think I don't understand it after all – mjgalindo May 01 '15 at 14:46
  • Looks like oracle9i doesn't support nocycle. And without it it runs out of temp tablespace (ORA-30928) – mjgalindo May 01 '15 at 14:58
  • Is there any way to tell the query when to stop? Following the Brangelina example, if I set Brad as the starting value can I set Angelina as the ending value so that it doesn't have to build the entire graph (I guess that's what it's doing)? – mjgalindo May 01 '15 at 15:17
  • Ok, the end-point won't avoid running out of temp even when the result should be 1. I also tried limiting the level to <=1 and even then it can't run it so I guess the account I have to work with is just too simple to run anything like this. – mjgalindo May 01 '15 at 16:58
1

The best approach is to use hierarchical queries.

Oracle (even in 9i version) has CONNECT BY clause. http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries003.htm

In combination with START WITH and LEVEL this gets ridiculously easy.

Example:

SELECT last_name, employee_id, manager_id, LEVEL
FROM employees
START WITH employee_id = 100
CONNECT BY PRIOR employee_id = manager_id
ORDER SIBLINGS BY last_name;
jakub.petr
  • 2,951
  • 2
  • 23
  • 34