7

Suppose I have a file with a one-liner (joke) on each line. I want to sort the jokes by how funny I find them. My first thought is to implement any sorting algorithm (preferably one that makes as few comparisons as possible) and having the comparison algorithm take my input; I'd just sit there and choose which of each pair of jokes it presented me was funnier.

There's a problem with that. My joke preference is not a total order. It lacks transitivity. For example, I might think that B is funnier than A when presented them, and that C is funnier than B, but when presented A and C somehow I find A to be funnier than C. If “>” means “is funnier than,” this means that C > B and B > A does not imply C > A. All sorting algorithms’ correctness depends on this.

But it still seems that there should be an algorithm that sorts the list of jokes so that the one at the top is most preferred over other jokes, and the one at the bottom is least preferred over other jokes, even if there are individual exceptions.

I don’t know how to Google this. Is there an algorithm for this kind of preference sorting? The answer here is not applicable because it forces the user’s preference to be transitive.

Community
  • 1
  • 1
nebuch
  • 6,475
  • 4
  • 20
  • 39
  • Recommendation System should work – Raghav Nov 13 '16 at 04:46
  • 2
    Can you complete your example? If you had three jokes A, B, C, and you found B funnier than A, C funnier than B, and A funnier than C, what order would you *expect* to see them in your output? Also are you the only one giving input? – Jason C Nov 13 '16 at 04:48
  • Note that your joke preference is not even a partial order! I wonder if there are any voting systems that allow voters to say "A > C" without providing any other votes. If so, then you might be able to use one of those systems, simply pretending that each of your comparison-preferences is from a different voter. – ruakh Nov 13 '16 at 04:48
  • @JasonC There would be no way to sort just A, B, C, so if the file was just them the order would be arbitrary. When they are part of a larger list one will be preferred more. – nebuch Nov 13 '16 at 04:49
  • @nebuch Ok, so if those three jokes are part of a larger list, what order would you expect them to appear in in that larger list? – Jason C Nov 13 '16 at 04:50
  • Will you have the time (and interest) to compare every single pair of jokes? Or do you need a heuristic that infers transitivity in cases where you haven't compared a given pair of jokes? – ruakh Nov 13 '16 at 04:51
  • @JasonC If A was preferred over more other jokes than any other, it would be at the top – nebuch Nov 13 '16 at 04:52
  • (Because if you *do* compare every single pair of jokes, then you can actually just count how many jokes each joke is funnier than, and then do a straight-up deterministic sort over those counts.) – ruakh Nov 13 '16 at 04:52
  • @nebuch Then you just answered your own question. Track the number of times you vote a given joke as superior in a pair, and sort by that. – Jason C Nov 13 '16 at 04:52
  • @JasonC see ruakh's comment: I can do statistically better with fewer comparisons if I use heuristics other than preference frequency – nebuch Nov 13 '16 at 04:54
  • @nebuch Ok, so, for each joke, pick a small random sampling of other jokes to compare it to (by voting). Same number of comparisons per joke. The larger the sample size is, the closer each jokes relative vote count will approach its "actual" value. – Jason C Nov 13 '16 at 04:55
  • I don't quite get it .. does each joke have to keep a list of it's relation to the other jokes? what if jokes are just assigned a floating point number (when up or down-voted) that is half between the numbers of the two jokes that it is moved between. – Slai Nov 13 '16 at 14:15

1 Answers1

5

If you represent your decisions as a directed graph, where each joke is a node and each directed edge indicates one joke being better than the other, then you can retrieve an ordering by constructing the path which follows the edges and visits each node exactly once.

This type of graph is called a Tournament, and the path is a Hamiltonian path. I've got good news for you Bub, a Hamiltonian is proven to exist if the graph is strongly connected. Strongly connected just means that every node can be reached from every node, obeying the direction of the edges, so keep adding edges until this is true.

Tournament: https://en.wikipedia.org/wiki/Tournament_(graph_theory)

Hamiltonian Path: https://en.wikipedia.org/wiki/Hamiltonian_path

vowel-house-might
  • 1,686
  • 14
  • 18