3

I have several tables with many to many relationship. For example table video schema

CREATE TABLE `videos` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `description` varchar(10000) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=47606 DEFAULT CHARSET=utf8;

and two tables actors film_actors, and for example writers and film_writers

actor and film actors schemas

CREATE TABLE `actors` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `actors_UN` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=152216 DEFAULT CHARSET=utf8;


CREATE TABLE `film_actors` (
  `actor_id` int(11) NOT NULL,
  `film_id` int(11) NOT NULL,
  PRIMARY KEY (`film_id`,`actor_id`),
  KEY `FKrs472oyyff3hfwq10pyo94k1d` (`actor_id`),
  CONSTRAINT `FK12uvap3je50qd8cq3s0jf7h7r` FOREIGN KEY (`film_id`) REFERENCES `videos` (`id`) ON DELETE CASCADE,
  CONSTRAINT `FKrs472oyyff3hfwq10pyo94k1d` FOREIGN KEY (`actor_id`) REFERENCES `actors` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

writer

CREATE TABLE `writers` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `writers_UN` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=39875 DEFAULT CHARSET=utf8;

CREATE TABLE `film_writers` (
  `film_id` int(11) NOT NULL,
  `writer_id` int(11) NOT NULL,
  PRIMARY KEY (`writer_id`,`film_id`),
  KEY `FKh0kfwnarp6utb4f80ycj1lap` (`film_id`),
  CONSTRAINT `FK4hwpb5l48m0xps6jqn1wyjb63` FOREIGN KEY (`writer_id`) REFERENCES `writers` (`id`) ON DELETE CASCADE,
  CONSTRAINT `FKh0kfwnarp6utb4f80ycj1lap` FOREIGN KEY (`film_id`) REFERENCES `videos` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Im try selecet film info with actors and writers, but result contains many dublicate row. I use this sql query

SELECT
    vid.id,
    vid.title,
    vid.original_title_name,
    vid.`year`,
    ac.id as actor_id,
    ac.name as actor_name,
    wr.id as writer_id,
    wr.name as writer_name
from
    (
    select
        *
    From
        videos v
    where
        v.id = 1722
    ) vid
JOIN film_actors fa ON
    vid.id = fa.film_id
JOIN actors ac ON
    fa.actor_id = ac.id
JOIN film_writers fw ON
    vid.id = fw.film_id
JOIN writers wr ON
    fw.writer_id = wr.id

The output is a lot of duplicates row.

id   |title           |original_title_name |year |actor_id |actor_name         |writer_id |writer_name    |
-----|----------------|--------------------|-----|---------|-------------------|----------|---------------|
1722 |Назад в будущее |Back to the Future  |1985 |1796     |Майкл Дж. Фокс     |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |2648     |Джордж ДиЧенцо     |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |4807     |Криспин Гловер     |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |7601     |Кристофер Ллойд    |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |8195     |Лиа Томпсон        |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |8707     |Марк МакКлюр       |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |9242     |Фрэнсис Ли МакКейн |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |9602     |Уэнди Джо Спербер  |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |10545    |Клаудия Уэллс      |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |10546    |Томас Ф. Уилсон    |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |1796     |Майкл Дж. Фокс     |320       |Роберт Земекис |
1722 |Назад в будущее |Back to the Future  |1985 |2648     |Джордж ДиЧенцо     |320       |Роберт Земекис |
1722 |Назад в будущее |Back to the Future  |1985 |4807     |Криспин Гловер     |320       |Роберт Земекис |
1722 |Назад в будущее |Back to the Future  |1985 |7601     |Кристофер Ллойд    |320       |Роберт Земекис |
1722 |Назад в будущее |Back to the Future  |1985 |8195     |Лиа Томпсон        |320       |Роберт Земекис |
1722 |Назад в будущее |Back to the Future  |1985 |8707     |Марк МакКлюр       |320       |Роберт Земекис |
1722 |Назад в будущее |Back to the Future  |1985 |9242     |Фрэнсис Ли МакКейн |320       |Роберт Земекис |

Is it possible to somehow convert the result to something like that?

id   |title           |original_title_name |year |actor_id |actor_name         |writer_id |writer_name    |
-----|----------------|--------------------|-----|---------|-------------------|----------|---------------|
1722 |Назад в будущее |Back to the Future  |1985 |1796     |Майкл Дж. Фокс     |319       |Боб Гейл       |
1722 |Назад в будущее |Back to the Future  |1985 |2648     |Джордж ДиЧенцо     |320       |Роберт Земекис |
1722 |Назад в будущее |Back to the Future  |1985 |4807     |Криспин Гловер     |          |               |
1722 |Назад в будущее |Back to the Future  |1985 |7601     |Кристофер Ллойд    |          |               |
1722 |Назад в будущее |Back to the Future  |1985 |8195     |Лиа Томпсон        |          |               |
1722 |Назад в будущее |Back to the Future  |1985 |8707     |Марк МакКлюр       |          |               |
1722 |Назад в будущее |Back to the Future  |1985 |9242     |Фрэнсис Ли МакКейн |          |               |
1722 |Назад в будущее |Back to the Future  |1985 |9602     |Уэнди Джо Спербер  |          |               |
1722 |Назад в будущее |Back to the Future  |1985 |10545    |Клаудия Уэллс      |          |               |
1722 |Назад в будущее |Back to the Future  |1985 |10546    |Томас Ф. Уилсон    |          |               |
DmitryJS
  • 186
  • 2
  • 12
  • 1
    What version of MySQL are you using? – Gordon Linoff Oct 27 '18 at 17:27
  • There is no reason writer 319 to be associated with actor 1796. So, this is a bad idea. It looks distinctly like a presentation layer is sure to be, and should normally not be resolved in your data layer. For example, why is your film repeated so many times? You're trying to do something that does not make sense from a data layer perspective. – MatBailie Oct 27 '18 at 19:50
  • You are getting what you are asking for. "select film info with actors and writers" is not clear. Please use enough words & sentences to be clear about how the output is a function of input. What did your assignment ask for? We can try to guess what you want from your example input & output but if you don't learn to express yourself clearly in natural language then how can you even know what you want & how do you expect to express it in code? See hits googling stackexchange homework'--What *parts that work* can you do or find in a textbook? See [mcve]--which includes a clear specification. – philipxy Oct 27 '18 at 23:22
  • [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) – philipxy Oct 27 '18 at 23:24
  • @GordonLinoff i am use 8x version. – DmitryJS Oct 28 '18 at 04:12
  • @MatBailie Hello Mat Yes this look like a wrong way. I can not understand what to do. In theory, it is considered that you need to get maximum data in one request. This is obtained by my query above, but such an answer is very difficult to put in the object. Application performance decreases, mapping takes a long time and eats memory. Unfortunately, in reality, you have to do 8 separate requests to collect all the information on the film. – DmitryJS Oct 28 '18 at 04:14
  • Look to batched statements. Where more than one record set is returned because it consists of multiple queries. If you're having client side preformance issues because of that, make this issue your next question on SO. – MatBailie Oct 28 '18 at 20:52

1 Answers1

4

The reason you're getting "duplicate" rows is because one film has many actors and one film has many writers. There is no relationship at all between a film actor and a film writer

A simple query for education purposes:

SELECT * 
FROM
  films 
  INNER JOIN actors on films.id = actors.filmid
  INNER JOIN writers on films.id = writers.filmid

This is simplified to make my point, it hides the complexity of your tables that break the many:many relationships down to many:one

Now. If a film has 10 actors and 11 writers, this query will produce 110 rows

Every actor will be associated with every writer

Why? Because there is absolutely no relationship between a film actor and a film writer. The two things have nothing at all to do with each other. We haven't written anything in a JOIN condition to assert that some actor attribute is equal to some writer attribute because we can't - there just isn't anything to relate the two of them together

The only thing the database can possibly do as a result, is produce a row set that has a row for every actor/writer combination. The actor data is repeated 11 times and the writer data is repeated 10 times

Actor1/Writer1
Actor1/Writer2
...
Actor2/Writer1
Actor2/Writer2
...
Actor10/Writer11

There's no way round this "problem"-it is caused by trying to put the two unrelated things together on a single query. The only "solution" is to not put them in the same query. If you're writing a website like IMDB and you have a webpage for a film with two tabs, one for actors, and one for writers, run two separate queries (film join actors) and (film join writers) to fill the data for each tab - you flat out can't* do it in one query.

*when I say "can't", I mean "really really shouldn't". The "desired results" grid you've posted associates actor 1796 with writer 319 for no good reason at all - everything on a row is supposed to be related and these two entities share no relation other than some arbitrary decision that they both are ranked first when their ids are ordered ascending. While there are ways that that could be used to associate them and eliminate the Cartesian product from your query, it's a terrible code smell and indicative of the fact that you're approaching some other problem (that we can't see) in entirely the wrong way (sorry)

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • Many thanks for the explanation. I'm just learning, this is an educational 'project'. In theory, they write that for maximum performance, you should extract as much data as possible per request) But in reality this is not always the case, my case illustrates this. I am writing a rest api, and I wanted to receive general information about the film, but I ran into this problem. But 8 separate subqueries work faster than one with many joins, while Java itself slows down when processing multiple result rows. Thanks again, in theory, as I understood, not everything is always smooth) – DmitryJS Oct 28 '18 at 04:27
  • For maximum performance you should extract as much *relevant* data per request. This doesn't mean you should use joins though, and strictly speaking here if you're writing an api that generates eg json and has a method like /getallmovieinfo/forrestgump consider using no joins - get the movie, get the actors, get the writers - three separate (multithreaded simultaneous if you want) queries. There isn't even any point asking the db to join eg film with actors because it's just going to repeat the film info, costing time and bandwidth when delivering the results. Overall the time is minuscule.. – Caius Jard Oct 29 '18 at 02:20
  • ...but compared to just getting the bare minimum info it's slower. Example: you're asked for Forrest Gump data. Pull the data out of films table based on name (index it). Extract the film id. Use film id to do just `filmactors join actors` and `filmwriters join writers`. Don't involve the film table again - shouldn't be needed once you have the film id. Use prepared statements so the db doesn't keep re planning and optimising the query. Probably the slowest thing in all this is opening and closing the db connection, so make sure you're using connection pooling – Caius Jard Oct 29 '18 at 02:28
  • (When I said no joins in my first comment that I can no longer edit, I meant the minimum number of joins-clarified in my second comment. No joins was in the context of my example, where films and actors have no middle-man table.. in your case there is a middle man table and a join would be needed to retrieve the actors if you know the film id. Hope that clears that up) – Caius Jard Oct 29 '18 at 02:30