1

I have tree tables. videos, categories and video_categories (which is basically a pivot table between videos and categories).

I am trying to get all the videos that do not have certain categories. The problem is that my result is containing the same video more than once, and i only need it once.

This is what i got

SELECT videos.* FROM videos
JOIN video_categories ON videos.id = video_categories.video_id
WHERE category_id NOT IN (64452, 1031, 32595, 1015, 26484, 1019)
ORDER BY downloads

And the result i get is containing one video the same number of times as the videos number of categories. Also one more thing is that videos table has 55 300 rows, and pivot table video_categories has 400 000 rows, so i cant just use DISTINCT as than query runs for about 10 seconds.

I am calling this query from PHP, and excluding categories manually from retrieved is not possible because of pagination.

Any suggestions how can i do this?

EDIT One video can have many categories. So it is many to many relation.

happy_fist
  • 174
  • 3
  • 11
  • Consider providing proper DDLs (and/or an sqlfiddle) TOGETHER WITH THE DESIRED RESULT SET. You probably do not have tree tables. – Strawberry May 29 '14 at 16:28
  • For this query i am using nothing except these tree tables. The reason I cannot provide whole DDL is because the client data is sensitive, so i just got the excerpt. But i can replicate result i got and desired result. Will this be enough? – happy_fist May 29 '14 at 16:40
  • It will help. also, I'm not sure the query you posted is syntactically correct. There's no FROM clause. – Jaaz Cole May 29 '14 at 16:42
  • @happy_fist No one suggested that you should provide sensitive client data. And you have 'three' tables, not 'tree' tables. – Strawberry May 29 '14 at 16:43
  • @Strawberry 'tree' was a mistake, as i saw it in your first comment. – happy_fist May 29 '14 at 16:57

2 Answers2

2

Your query as you currently have it is not only inefficient it is wrong (aside from the missing from clause which I assume is a typo), consider a video has categories 1, and 1031, when you run:

SELECT videos.* FROM Videos
JOIN video_categories ON videos.id = video_categories.video_id
WHERE category_id NOT IN (64452, 1031, 32595, 1015, 26484, 1019)
ORDER BY downloads;

This video will be returned because category 1 is not in the list of categories provided. What you would need to do is find all the videos that have those categories, then exclude those videos. I would usually do this using NOT EXISTS as I think it is more logical:

SELECT  v.*
FROM    Videos AS v
WHERE   NOT EXISTS
        (   SELECT  1
            FROM    video_categories AS vc
            WHERE   vc.video_id = v.id
            AND     vc.category_id IN (64452, 1031, 32595, 1015, 26484, 1019)
        );

However, MySQL optimises this kind of query better using LEFT JOIN/IS NULL:

SELECT  v.*
FROM    Videos AS v
        LEFT JOIN video_categories AS vc
            ON vc.video_id = v.id
            AND vc.category_id IN (64452, 1031, 32595, 1015, 26484, 1019)
WHERE   vc.Video_ID IS NULL;

If this is still not fast then you are probably missing logical indexes. In your Videos table ID should be the clustering key, and in video_categories you should have a compound clustering key of (video_id, category_id).

Community
  • 1
  • 1
GarethD
  • 68,045
  • 10
  • 83
  • 123
  • Thanks for your help, i will get back to you as soon as i try this. Been a while with this problem, i need to clear my head first. – happy_fist May 29 '14 at 17:17
  • Your solution also works, but benchmark has shown that the query vav has written runs faster. That is the reason i have choosen his answer. Thanks GarethD – happy_fist May 30 '14 at 08:31
1

You have table video and video_categories. Relation between them is unknown at the moment.

If video_categories is one to one with video table then your query would not return duplicate records.

So, I am assume that there are multiple video_categories for the same video.

In this case, here is a posible solution:

select * from video 
where video_id in 
(select video_id from video_category 
where category_id not in (list of meaninless ids))
vav
  • 4,584
  • 2
  • 19
  • 39
  • It seems to be right, but i will try this as soon as I clear my head...currently i cannot think :/ been with this problem for quite a while. Thanks for your help, will get back to you when i try this again – happy_fist May 29 '14 at 17:16
  • Yep this does the work, it runs pretty fast. At least 7 times faster than my query. Thanks vav :D – happy_fist May 30 '14 at 08:29