Ranking from pairwise comparisons

Question

Imagine I have a very long list of images and I want to rank them in order of how 'good' people think they are.

I don't want to get users to assign scores to images outright (1 - 10 etc) and order by that, I'd like to try something new.

What I was thinking would be an interesting way of doing it is:

Show a user two random images, they pick the better one
Collect lots of 'comparisons'
Use all the comparisons to come up with some ordering

Turns out this is used regularly, for example (using features, not images), this appears to be the way Uservoice's Smartvote works.

My question is whether there's a good known way to take this long list of comparisons and build a relative ranking for all the images from them but not to the level of complexity found in the research papers.

I've read a bunch of lectures and research papers but I was wondering if there was any sample code out there people might recommend?

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

Seems like you could just get some kind of numerical ranking system and then just sort based on that. Just borrow the algorithm from a win/loss sport, or chess, and treat each image comparison as a bout.

Did some looking, here's some sample code of what an algorithm like that looks like in Java

And here's a library you can borrow in python

If you search ELO you'll find a version of it in just about any language. Once you get your numerical image rankings, you can sort them any way you like. There are probably other ranking algorithms you could look into for win/loss competition, that was just the first that came up when I googled chess ranking.

edited Jun 20 '20 at 09:12

Community

1
1

answered May 22 '14 at 13:30

NathanTempelman

1,301
9
34

Good shout, I looked into ELO and a few other rankings, it seems the main downside is that a lot of algorithms for pairwise ranking assume that 'everyone plays everyone' which in my case isn't feasible. I think I need more comparisons before I pronounce ELO a success. – Pete Hamilton May 24 '14 at 14:37
1

ELO doesn't appear to need everyone to play everyone, but the more comparisons per image there are, the more meaningful their rating will be. That doesn't mean you need n^2 comparisons though. Even something like twenty comparisons per image would be statistically meaningful, and over time the rankings would be more and more accurate. – NathanTempelman May 26 '14 at 14:13

score 1 · Answer 2 · answered May 22 '14 at 13:39

1

For every image, count the number of times it won a duel, and divide by the number of duels it took part in. This ratio is your ranking score.

Example:

A B, A C, A D, B C, B D

Yields

B: 67%, C, D: 50%, A: 33%

Unless you perform a huge number of comparisons, there will be many ties.

answered May 22 '14 at 13:39

I can think of a lot of cases in which this would go very wrong, but assuming inconsistent data and good random sampling this could be decent. – Slater Victoroff May 22 '14 at 13:43
Yep, randomization is recommended. – May 22 '14 at 13:45
I've tested this with random 'duels' and on the surface it looks like a fairly intuitive algorithm. Is this an established method of ranking? @SlaterTyranus What are the potential pitfalls you thought of? Having read a lot of complex solutions, there must be a reason this isn't the one generally used. I imagine the fact that draws are kind of treated as losses here isn't great? – Pete Hamilton May 24 '14 at 14:40
@PeterHamilton There are a great number of them, largely around the fallacy that each comparison is equal. If you imagine most cases where comparisons are anisotropic, conflicts occur. That said, it comes out as a decent approximation with a lot of trials and good randomization. – Slater Victoroff May 25 '14 at 05:28
It's a good idea. The main drawback of this algorithm though is that it doesn't take into account the relative rankings of two competing images. A win against the top ranked image is considered the same as a win against the bottom ranked image. That difference in ranking is valuable data that's thrown away here. Factoring that in could make a huge difference in how quickly any new images' rankings would become statistically significant. – NathanTempelman May 26 '14 at 14:24
I found that in testing, this fell down quite often (as in, gave strange rankings or orderings I wouldn't have expected), presumably due to the issues you've outlined. Thanks to all who commented, but ELO has proven (so far) to work better for my purposes. – Pete Hamilton May 27 '14 at 17:16

Ranking from pairwise comparisons

2 Answers2

Linked