Get records order by count of related Many to Many association

Question

Scnerio:

https://www.funtraker.com is listing movies, tv shows and games. On show page of each resource(Movie, Tv Show etc) we want to list down the related resources.

Schema:

class Movie < AR::Base
  has_many :resource_genres, as: :resource
  has_many :genres, through: :resource_genres
end

class ResourceGenre
  belongs_to :resource, polymorphic: true
end

Now I want to get a list of related movies based on matched genre( two movies are related if both has 'comedy` genre). And these related movies need to order by max number of matched genres.

Well here is sample movies and the expected output.

#Input
Movie         Genres
Movie 1:      horror, comedy, action, war
Movie 2:      action, thriller, crime, animation  
Movie 3:      comedy, war, action, thriller
Movie 4:      crime, animation, action, war

#Expected output
movie1.related_movies => [ movie3, movie2  ]
movie4.related_movies => [ movie2, remaining-three-movies-in-any-order ]
movie3.related_movies => [ movie1, movie2, movie4]

Hopefully question make sense.

UPDATE: Looking for SQL only solution. I don't need to cache the results in any another table.

How many records will you be doing this with? Using ActiveRecord for a few hundred movies will work ok but this query will not scale. — eabraham, Jan 20 '15 at 16:27
possible duplicate http://stackoverflow.com/questions/8454478/algorithm-advice-calculating-related-stores-based-on-their-category-data — Saqib, Jan 20 '15 at 19:06
@eabraham yeah top 10 means first 10 entries. Entries would be order by count of max matches so first ten will also be top 10 :) — Naveed, Jan 21 '15 at 02:50

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

You need to order by the group count of the movies ids after joined with resource_genres, take a look at the following pure SQL methods:

Method #1 (Single Query)

Double joining the resource_genres table on itself to maintain self genres ids:

def related_movies
   Movie.select("movies.*, COUNT(*) AS group_count").
   joins(:resource_genres).
   joins("JOIN resource_genres rg ON rg.genre_id = resource_genres.genre_id").
   where("rg.resource_type = 'Movie' 
          AND rg.resource_id = ? 
          AND movies.id != ?", self.id, self.id).
   group('movies.id').
   order('group_count DESC')
end

Method #2 (2 Queries)

Plucking the genre_ids from self resource_genres on a separate query.

def related_movies
   Movie.select("movies.*, COUNT(*) AS group_count").joins(:resource_genres).
   where("resource_genres.genre_id IN (?) 
         AND movies.id != ?", self.resource_genres.pluck(:genre_id), self.id).
   group('movies.id').
   order('group_count DESC')
end

I'm going with second answer for now. Will benchmark and post here with which query was more efficient in few days. Thanks for answer and congrats for bounty :) BTW I was missing: "movies.*, COUNT(*) AS group_count" — Naveed, Jan 23 '15 at 05:44
I'm glad it helped . I usually prefer Method 1, try to keep your benchmarks on 1000+ records per table, and let us know. — mohameddiaa27, Jan 23 '15 at 10:30

Kashif Umair Liaqat · Answer 2 · 2015-01-26T08:45:34.463

0

If you are finding a solution in rails code, then it might solve your problem.

def related_movies
  scores_hash = {}
  Movie.joins(:resource_genres).where('resource_genres.genre_id' => resource_genres.pluck(&:genre_id)).where.not(id: self.id).distinct.find_each do |movie|
    scores_hash[movie] = (movie.resource_genres.pluck(:genre_id) & self.resource_genres.pluck(:genre_id)).count
  end
  Hash[scores_hash.sort_by { |movie, score| -score }].keys
end

edited Jan 26 '15 at 08:45

answered Jan 21 '15 at 08:43

Kashif Umair Liaqat

1,064
1
18
27

A better approach for joins will be `Movie.joins(:genres).where('genres.id IN(?)', genres.pluck(:id))` and I'm looking for SQL only solution :) – Naveed Jan 21 '15 at 23:10
If you have an index on genre_name then the performance will be same for my solution as well as your's :) – Kashif Umair Liaqat Jan 22 '15 at 08:13
You may handle `X.related_movies` should not return `X` – mohameddiaa27 Jan 23 '15 at 03:21
Behind the scenes `Movie.joins(:genres)` is a join of 3 tables, and so is every call to `movie.genres`. So you may consider using resource_genres table instead while adding some eager-loading. – mohameddiaa27 Jan 23 '15 at 03:26
@mohameddiaa27, thanks for pointing that out. I've made the changes and I really appreciate your help :) – Kashif Umair Liaqat Jan 26 '15 at 08:46

Get records order by count of related Many to Many association

Scnerio:

Schema:

2 Answers2

Method #1 (Single Query)

Method #2 (2 Queries)