3

Can someone please help me clarify.

I am currently using collaborative filtering (ALS) which returns a recommendation list with scores corresponding to the recommended items. In addition to this, I am boosting the scores (+0.1) if the items contain a tag that corresponds with what the user has specified they prefer such as "romantic movies". To me, this is considered a hybrid collaborative approach since it's boosting the Collaborative filtering results with content-based filtering (Please correct me if I am wrong).

Now, what if I did the same approach without doing Collaborative filtering? would it be considered Content-based Filtering? since I will be still recommending dishes based on the content and attributes of each dish corresponding to what the user has specified they like (such as "romantic movies").

The reason why I'm confused is because I've seen content-based filtering where they apply an algorithm such as Naive Bayes etc, and this approach would be similar to a simple search of the items (on the contents).

AlphaWolf
  • 183
  • 4
  • 16

1 Answers1

5

Not sure you can do what you suggest because you have no score to boost without CF.

You are indeed using a hybrid, much the same as the Universal Recommender. To do purely content-based recommendations you have to implement two methods

  • Personalized recommendations: here you have to look at the content of items the user preferred and find items that have similar content. This can be done by using something like the Mahout spark-rowsimilarity job to create a model of item: list-of-similar-items then indexing the results with a search engine and using the user's preferred item ids as the query. This is being added to the Universal Recommender.
  • "People who liked this also liked these": these are items similar to one being viewed, for example, and are the same for all users. They are not personalized and so are useful even for anonymous users with no history. This can be done with the same indexed ids as above but using the items similar to the one being viewed as the query. One might think to use only the similar items themselves but by using them as a query you can put the categorical boost in the search engine query and have boosted items returned. This already works in the Universal Recommender but the similar items are not in the model yet.

That said mixing content with collaborative-filtering will almost surely give better results since CF works better when the data is available. The only time to rely on content-based recommendations is when your catalog is of one-off items, which never get enough CF interactions or you have rich content, which has a short lifetime like breaking news.

BTW anyone who wants to help add the pure content-based part to the Universal Recommender can contact the new maintainers of it at ActionML.com

pferrel
  • 5,673
  • 5
  • 30
  • 41
  • Thank you Pat. I have one more question, for the first approach you suggested (personalized recommendations), Can this be done with an arbitary score if the user has already specified tags that they like? – AlphaWolf Sep 29 '15 at 23:54
  • Generally scores in content-based approaches are similarity scores like the cosine of the angle between the multi-dimensional term vectors or the log-likelihood ratio score for the mahout jobs. Search engines use TF-IDF weighted term frequencies after running them through term analyzers ( to stem and n-gram them). The user scores only by showing interest then similarity of content is used. If you had user preferences/ratings you would probably be better off with collaborative filtering. – pferrel Oct 01 '15 at 00:39