0

I have a database that contains:

user_id | category_id | liked_id | disliked_id

(thanks to stack overflow users for helping me get my database setup properly in the first place!!) Last time I used food as an example but this time I'm going to use people.

The user is given 2 images (male vs male or female vs female) and he/she simply chooses which one he/she thinks is more attractive. The user repeats this process as long as he/she wishes. Each selection is entered into the database showing which person they liked and which they disliked (also a button would be available if you think the two are similar).

Now that I have my table full of entries, I'm trying to develop an algorithm that will take all of those "votes" and translate it into a ranked list of who the user finds most attractive (based on hundreds or maybe even thousands of ranking entries).

I've been at the drawing board for hours and can't seem to think of an effective way of doing this.

Any help would be appreciated.

P.S.: The idea is also to have this be a multi-user thing, where other users can see your "like" tables and also have globally averaged tables showing how all users in general rank things.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
Jonathan Plumb
  • 417
  • 1
  • 4
  • 12
  • What database are you using? Do you have any query already even if not working? – Lloyd Santos Feb 20 '13 at 09:50
  • The database isn't implemented yet, only designed on paper. Right now I have my classes storing the data similarly in C# (i.e.: my Vote class contains user_id, category_id, liked_id, and disliked_id strings and I've created a List instance). – Jonathan Plumb Feb 20 '13 at 09:52

1 Answers1

0

So you posted your question in the c# group. I want to give you, however, a solution that is implemented in the database, making it more independent of your program.

What you probably want to do first is to get the number of times an image has been liked and disliked. This SQL statement should do that for you (if you are using a database supporting grouping sets it would probably be easier to write):

SELECT t1.liked_id as id, t1.c_liked, t2.c_disliked
FROM 
    (SELECT liked_id, COUNT(*) as c_liked FROM table GROUP BY liked_id) t1
LEFT JOIN
    (SELECT disliked_id, COUNT(*) c_disliked FROM table GROUP BY disliked_id) t2
ON
    t1.liked_id = t2.disliked_id

Then it's up to you what you do with the numbers. In the outermost SELECT-statement, you could put a very complicated function, e.g. you could choose to weigh the dislikes less than the likes. To give you an idea of a possible very simple function:

SELECT t1.liked_id as id, 
(t1.c_liked/(t1.c_liked + t2.c_disliked) -  t2.c_disliked/(t1.c_liked + t2.c_disliked)) as score

This returns you values [-1, 1] (which you could normalize to [0, 1] if you like, but don't have to), which you then can sort as in this example:

SELECT t1.liked_id as id, 
    (t1.c_liked/(t1.c_liked + t2.c_disliked) -  t2.c_disliked/(t1.c_liked + t2.c_disliked)) as score    
FROM 
    (SELECT liked_id, COUNT(*) as c_liked FROM table GROUP BY liked_id) t1
LEFT JOIN
    (SELECT disliked_id, COUNT(*) c_disliked FROM table GROUP BY disliked_id) t2
ON
    t1.liked_id = t2.disliked_id
ORDER BY score
navige
  • 2,447
  • 3
  • 27
  • 53
  • Thanks for the comment, but here's the problem. Images are selected at random. A user might get image A 100 times, image B 10 times, and image C 1 time. The user has indicated that he likes A above B every time they have been compared. Then C pops up with A and the user says he liked C better than A. Well A has 100 votes and C only has 1, but C's vote was higher than A, so in the rankings C should be higher. This is the problem. Of course instead of being random I could have images be displayed as evenly as possible... – Jonathan Plumb Feb 20 '13 at 10:10
  • Well the question is how you want to define your ranking function. Consider the function I put above you would get the following ranking for your example: C > A > B (C at first place because it has 1 like and no dislike, then A because it has 100 likes but 1 dislike, too, then B). If you, however, rather wish to have something like A > C > B, because A has been liked generally more often than C, then you could use a function like t1.c_liked - t2.c_disliked. It is really up to how complicated you want to make the ranking function. – navige Feb 20 '13 at 10:20
  • I think I may have solved it. I'm just going to give each item a number value. If the two items have the same value, the one you vote for gets +1. If the one you voted for was less than the one you didn't, it's value goes to +1 of the one you didn't vote for. Would this work or is it flawed? – Jonathan Plumb Feb 20 '13 at 10:22
  • 1
    Yes that should work, too. Maybe have a look at the question [http://stackoverflow.com/questions/164831/how-to-rank-a-million-images-with-a-crowdsourced-sort?rq=1](http://stackoverflow.com/questions/164831/how-to-rank-a-million-images-with-a-crowdsourced-sort?rq=1) - I think it uses a similar idea. – navige Feb 20 '13 at 10:25
  • Also have a look at [http://en.wikipedia.org/wiki/Ranked_Pairs](http://en.wikipedia.org/wiki/Ranked_Pairs) and [http://en.wikipedia.org/wiki/Condorcet_method](http://en.wikipedia.org/wiki/Condorcet_method) – navige Feb 20 '13 at 10:30