2

**

The full question: The Shahrukh number of an actor is the length of the shortest path between the actor and Shahrukh Khan in the "co-acting" graph. That is, Shahrukh Khan has Shahrukh number 0; all actors who acted in the same film as Shahrukh have Shahrukh number 1; all actors who acted in the same film as some actor with Shahrukh number 1 have Shahrukh number 2, etc. Return all actors whose Shahrukh number is 2.

**

my SQL query:

#loading SQL module
%load_ext sql

#connect to the database
%sql sqlite:///Db-IMDB.db

%%time
%%sql

UPDATE Movie SET year = REPLACE(year, "I", "");
UPDATE Movie SET year = REPLACE(year, "V", "");
UPDATE Movie SET year = REPLACE(year, "X ", "");
UPDATE Movie SET title = LTRIM(title);
UPDATE Movie SET year = RTRIM(LTRIM(year));
UPDATE Movie SET rating = RTRIM(LTRIM(rating));
UPDATE Movie SET num_votes = RTRIM(LTRIM(num_votes));

UPDATE M_Producer SET pid = RTRIM(LTRIM(pid));
UPDATE M_Producer SET mid = RTRIM(LTRIM(mid));

UPDATE M_Director SET pid = RTRIM(LTRIM(pid));
UPDATE M_Director SET mid = RTRIM(LTRIM(mid));

UPDATE M_Cast SET pid = RTRIM(LTRIM(pid));
UPDATE M_Cast SET mid = RTRIM(LTRIM(mid));

UPDATE M_Genre SET gid = RTRIM(LTRIM(gid));
UPDATE M_Genre SET mid = RTRIM(LTRIM(mid));

UPDATE Genre SET gid = RTRIM(LTRIM(gid));
UPDATE Genre SET name = RTRIM(LTRIM(name));

UPDATE Person SET name = RTRIM(LTRIM(name));
UPDATE Person SET pid = RTRIM(LTRIM(pid));
UPDATE Person SET gender = RTRIM(LTRIM(gender));

%%time
%%sql

select distinct PID, 
Name 
from Person natural 
join M_Cast
where Name != ‘Shah Rukh Khan’ and MID in 
(select MID from M_Cast 
 where PID in 
 (select PID 
  from Person natural 
  join M_Cast
  where Name != ‘Shah Rukh Khan’ and MID in 
  (select MID
   from Person natural 
   join M_Cast
   where Name != ‘Shah Rukh Khan’)))and PID not in 
(select PID
 from Person natural 
 join M_Cast where Name != ‘Shah Rukh Khan’ and MID in 
 (select MID
  from Person natural 
  join M_Cast where Name = ‘Shah Rukh Khan’))
limit 7;

output: -- I a getting an error.

* sqlite:///Db-IMDB-Assignment1.db
(sqlite3.OperationalError) near "Rukh": syntax error
[SQL: select distinct PID, Name 
from Person natural 
join M_Cast
where Name != ‘Shah Rukh Khan’ and MID in 
(select MID from M_Cast 
 where PID in 
 (select PID 
  from Person natural 
  join M_Cast
  where Name != ‘Shah Rukh Khan’ and MID in 
  (select MID
   from Person natural 
   join M_Cast
   where Name != ‘Shah Rukh Khan’)))and PID not in 
(select PID
 from Person natural 
 join M_Cast where Name != ‘Shah Rukh Khan’ and MID in 
 (select MID
  from Person natural 
  join M_Cast where Name = ‘Shah Rukh Khan’))
limit 7;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Wall time: 51.7 ms

Schema:

find the schema to the problem statement here

DB: The DB link can be obtained here

Need help. Thanks in advance.

After editing from backticks to double-quotes on the strings::

%%time
%%sql

select distinct PID, 
Name 
from Person natural 
join M_Cast
where Name != "Shah Rukh Khan" and MID in 
(select MID from M_Cast 
 where PID in 
 (select PID 
  from Person natural 
  join M_Cast
  where Name != "Shah Rukh Khan" and MID in 
  (select MID
   from Person natural 
   join M_Cast
   where Name != "Shah Rukh Khan")))and PID not in 
(select PID
 from Person natural 
 join M_Castwhere Name != "Shah Rukh Khan" and MID in 
 (select MID
  from Person natural 
  join M_Castwhere Name = "Shah Rukh Khan"))
limit 7;

still facing error

* sqlite:///Db-IMDB-Assignment1.db
(sqlite3.OperationalError) near "!=": syntax error
[SQL: select distinct PID, Name 
from Person natural 
join M_Cast
where Name != "Shah Rukh Khan" and MID in 
(select MID from M_Cast 
 where PID in 
 (select PID 
  from Person natural 
  join M_Cast
  where Name != "Shah Rukh Khan" and MID in 
  (select MID
   from Person natural 
   join M_Cast
   where Name != "Shah Rukh Khan")))and PID not in 
(select PID
 from Person natural 
 join M_Castwhere Name != "Shah Rukh Khan" and MID in 
 (select MID
  from Person natural 
  join M_Castwhere Name != "Shah Rukh Khan"))
limit 7;]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Wall time: 6.98 ms

Just found out the correct number of rows for the solution is 25698 whereas, the above solution gives an output of 48 rows. Need your help. Thanks

  • P.C.: I have been executing this on my python notebook. Have tried "row_number() over partition", which gives an error, found out that MySQL doesn't use that. –  Jun 28 '20 at 06:24
  • 1
    @AkshayKumar row_number and other window functions require mysql 8.0 or mariadb 10.2. – ysth Jun 28 '20 at 08:32

2 Answers2

1

I'm just describing the logic, so the code below will not run on your schema.

You can get all actors who worked with Shahrukh Khan (SK=0) with

CREATE VIEW sk1 AS (                            /* SK=1 actors are... */
SELECT a1.id FROM actors AS a1                  /* ...those actors... */ 
  JOIN cast AS c1 ON (c1.actor_id = a1.id)      /* ...who casted... */
  JOIN cast AS sk ON (sk.film_id = c1.film_id)  /* ...in the same film cast... */
  JOIN actor AS a0 ON (sk.actor_id = a0.id AND a0.name = 'Shahrukh Khan')     /* as the actor, whose name is 'Shahrukh Khan' */
  WHERE a1.id != a0.id                          /* but are not him */
);

Now SK2 actors are among the other actors

CREATE VIEW oc AS (                             /* Other actors are... */
    SELECT c1.actor_id AS id FROM cast AS c1    /* ...those who casted... */ 
    LEFT JOIN sk1 ON (sk1.id = c1.actor_id)     /* ...related to actors in SK1... */
    LEFT JOIN actor AS a0 ON (c1.actor_id = a0.id AND a0.name = 'Shahrukh Khan')
                                                /* ...and the actors who are SK... */
    WHERE sk1.id IS NULL AND a0.id IS NULL      /* ...by not being there. */
) AS otherActors;

Actors who are in otherActors, and have casted with actors in SK1, have SK of 2:

SELECT COUNT(*) FROM (
SELECT otherActors.id FROM otherActors AS oc
    JOIN cast AS c1 ON (oc.id = c1.actor_id)
    JOIN cast AS c2 ON (c1.film_id = c2.film_id)
    JOIN sk1 ON (c2.actor_id = sk1.id)
) AS sk2;

You will notice that there are some useless JOINs up there (I never need data from actor a1, as the data I get, a1.id, is by definition the same as c1.actor_id). It might come in handy if you need the names, though. Or you can do a further JOIN when (and if) you need actor information beyond their ID. I also never need data from the films. Using actor forces each actor to only be in the set once, though; if I had used cast straight away, then I might well have found repeating actor ID's, forcing me to add a DISTINCT clause.

Finally, this is a non-recursive, non-programmable implementation. SK=3 would force me to add another view and further complications. MySQL 8.0+ has recursive CTE support, which completely changes the game (you can adapt this answer, but beware of loops; you want to add an explicit check so that the "next" set always fishes among actors with no SK number assigned).

LSerni
  • 55,617
  • 10
  • 65
  • 107
0

You are using backticks to enclose string literals; mysql requires either single or double quotes instead. In mysql, backticks are used to quote identifiers that might otherwise conflict with reserved words, not string literals.

So replace every

‘Shah Rukh Khan’

with

"Shah Rukh Khan"
ysth
  • 96,171
  • 6
  • 121
  • 214
  • Firstly thank you. I have tried, but still getting error, I have updated the problem info. area above. Need your help. –  Jun 28 '20 at 08:50
  • Okay, I have got certain output after separating **"where"** clause in the code, I am not getting the correct answer for the problem statement. Can you please help me in getting the right solution for the above. –  Jun 28 '20 at 09:01