6

I'd like to get some help to build a like/dislike sorting algorithm to find the best entries. I thought about a way to do it, but there are two major flaws with this method and I'd like to know if there's any better way.

Here's how I thought about doing it:

The entries would be sorted by the ratio given by l/d where l = number of likes and d = number of dislikes, so that those with a higher ratio have a bigger likes count and deserve a higher up place than those with a low ratio.

There are two issues with this method:

1: if the number of dislikes is 0 the l/d will be impossible. So even if an entry has a thousand of likes and 0 dislikes it still won't get any place into the scoreboard.

2: entries with a low amount of likes and dislikes are at an advantage in comparison with those with many ratings since it takes a low amount of ratings to influence the ratio and give the entry a good score.

What do you think?

EDIT: Here's a possible alternative that fixes the 1st issue: (l + 1) / (d + 1). Any feedback on this one?

Aldwoni
  • 1,168
  • 10
  • 24
kettlepot
  • 10,574
  • 28
  • 71
  • 100
  • +1 because I'm sure this is a very common problem with a strong mathematical/statistical answer but I don't know the best solution off the top of my head. – FogleBird Jul 05 '11 at 21:35
  • 1
    Here's some interesting information on how Reddit does it: http://amix.dk/blog/post/19588 – FogleBird Jul 05 '11 at 22:07
  • @FogleBird [the XKCD version](http://blog.reddit.com/2009/10/reddits-new-comment-sorting-system.html) – Mateen Ulhaq Jul 05 '11 at 22:22

3 Answers3

13

This might be relevant: How Not To Sort By Average Rating.

mhum
  • 2,928
  • 1
  • 16
  • 11
2

To remove the division by zero, you might add 1 to the numerator and denominator to obtain (l+1)/(d+1). If you want to more highly rank entries with more likes, then you might multiply your ranking formula by log(number of likes + 1). Here the one is added to remove the mathematical error that results if the entry has zero likes. For the discussion that follows, assume that the log has a base of 10. So a ranking formula that meets the requirements would be (likes + 1)/(dislikes + 1) * log(likes + 1).

Observe that this formula provides a rank of 0 if there are no likes because log(1) = 0. Suppose that the votes are tied with one like vote and one dislike vote. Then the rank is 2/2*log(2) = 0.3 because log(2) = 0.3. Now consider another tie with 9 likes and 9 dislikes. Then the rank is 10/10*log(10) = 1, because log(10) = 1. That is, the log(likes) term ranks ties with more likes more highly than ties with fewer likes.

Seth Difley
  • 1,330
  • 1
  • 23
  • 33
  • After writing this, I noticed your edit that provided the (l+1)/(d+1) fix. – Seth Difley Jul 05 '11 at 22:20
  • 2
    Nice, the only problem being that intuitively you'd want something with 0 likes and 0 dislikes score higher than 0 likes and 1000 dislikes. – biziclop Jul 05 '11 at 22:44
0

This has worked the best for me.

rank = likes * 100 / (likes + dislikes)

It orders by higher likes, then any like and/or dislike activity, then no activity. examples:

likes, dislikes => rank
0, 0 => 0            //avoid /0 error
3, 3 => 50
3, 0 => 100
hIpPy
  • 4,649
  • 6
  • 51
  • 65