1

I'm developing a web site where the user rates content (1-5 stars). I need to measure the popularity of the content (also referred to as importance/hotness/interest). My first thought was just to add the user ratings for a content:

Popularity = SUM(Rating - 2.5)

If two users gives it 5-stars and one gives it 2 stars it gets popularity of 2.5+2.5-0.5 = 4.5. The value then gets dampened depending on how old the content is. I want it to be as accurate as possible so I'm wondering if this is "good enough" or if there is a better way by e.g. analyzing the distribution of ratings, or if I must bring in more metrics (views, comments, shares, time spent on content etc.).

Pking
  • 953
  • 1
  • 14
  • 33

2 Answers2

3

Bit of a classic question, this. Your approach is good, but does it take into account the reliability of the score? You hint that is doesn't.

The more ratings a post gets, the more reliably the ratings tell you the value.

On the other hand, a singular bad rating is to be trusted less.

Being able to account for the reliability of your data set and there by calculating what it tells us is what Bayes in statistics is all about. You need a Bayesian average: see these articles here and an excellent set of resources here.

As this is a stack overflow question, here is one of many canonical SO questions about how to compute the average.

Here is a good book if you fancy discovering the history and philosophical dimensions to this old nugget.

Community
  • 1
  • 1
Tom
  • 1,773
  • 15
  • 23
  • I use bayesian estimate to calculate the average rating of the content (not the popularity though which maybe I should). – Pking Jul 23 '13 at 16:55
  • Aha- sounds like perhaps you've just got the concepts the wrong way round. I put it to you- truly "average rating" alone isn't Bayesian, is it? It should just be the mean/ expected value etc... etc.. depending on the scenario. However popularity introduces a *subjective* dimension that can only be represented for in a formular by a bayesian prior. What do you think? Am I being ideological? Or do we just have different conceptions of popularity? – Tom Jul 23 '13 at 16:57
  • It is a difficult concept, I believe the formal term is "importance" and it is different from "performance" (which is more analogous to average rating).. the problem is that rating can be indicator of both importance and performance, because users rates things based on feelings of "I like this" or "I dislike this" - i.e. a 1-star rating can mean "I'm disinterested in this" and/or "this quality of this content is poor". – Pking Jul 23 '13 at 17:08
  • To clarify, the bayesian estimate is the indicator of quality/perfomance. I'm looking to measure popularity/importance, by looking at rating and number of ratings.. – Pking Jul 23 '13 at 17:11
  • It doesn't need to be a measure of any paritcular thing- its an indicator based on the metrics, so if needs be, let's apply the same thinking to other metrics! What have you got? What do you think is important? – Tom Jul 23 '13 at 17:15
  • Only metric that I considered was counting views, but there should be a pretty big correlation between view count and ratings so not sure how useful it would be. – Pking Jul 23 '13 at 17:24
  • I agree with Tom. To be precise, what you are measuring is user's preference on each movies, which is not popularity/importance. The metrics is highly biased by number of ratings, and assumes 2.5 is global mean i.e. "the movie is not bad to me". I'm not sure the definition of importance or popularity. – Patrick the Cat Jul 23 '13 at 17:25
  • The user rating measure is "the valence and strength of the consumer's feelings for the product" (to quote another user), which can't directly be translated to either importance/popularity or quality/performance - it can be either or both, as the user rates for different reasons. A rating implies some kind of interest, a high rating may imply a high interest, and/or it may imply high quality.. so translating the rating to a measure of popularity is tricky. – Pking Jul 23 '13 at 17:53
1

First, popularity is not a well-defined concept. One may assume it is proportional to ratings, but I can also say "Movie A is popular because everyone watched it, but its quality is not as good as expected.". That way, there are many ratings, but overall the ratings are not too good.

In a naive way, you can measure the average offset of ratings from the global mean for each movies.

In a more sophisticated way, you should also take into account how many ratings there are, which is hard to formulate.

Normally, if you are building a recommender system, you would use item similarity or user similarity etc. It's because they are relative. Popularity by default should be bounded absolute scale, which is rather hard to formulate right for recommendations.

I suggest you read the following paper if you're going for recommender system:

http://www.grouplens.org/node/475

Patrick the Cat
  • 2,138
  • 1
  • 16
  • 33
  • I'm building an aggregator site that should promote popular as well as quality content. So "average offset of ratings form the global mean" means that stronger reaction positive/negative to content makes it popularity rating go up? – Pking Jul 23 '13 at 17:37
  • I'm not sure why the phrase "popularity" must occur in this case. I think what is important is what is good in the user's view point. If we have data of that user's preference, query output can be personalized. If not, we can take the global mean of user profiles, which becomes the "average offset of ratings from the global mean". Positive offset means positive reaction from users overall in rating metric. It implies "the movie is liked by users that we know of overall", but not "the movie is known by many users for good reasons". – Patrick the Cat Jul 23 '13 at 22:23
  • You are strongly recommended to take a look at chp 2 of the paper I gave above. – Patrick the Cat Jul 23 '13 at 22:27