How do I sort a list probabilistically based on properties of their elements?

Question

Consider this sample of User objects:

import numpy as np


class User:
    def __init__(self, name, rating, actual_rating):
        self.name: str = name
        self.rating: int = rating

        # Actual States
        self.actual_rating: int = actual_rating

users = []
for actual_rating in np.random.binomial(10000, 0.157, 1000):
    users.append(
        User(str(random()), 1500, actual_rating)
    )


# Sorting Users Randomly
sorted_users = []

How do I sort this users list such that the likelihood that an object is lower in index in the sorted_users depends on actual_rating being higher. For instance a random User("0.5465454", 1500, 1678) will have a higher likelihood of being sorted to be at index 0 of the sorted_users list than say User("0.7689989", 1500, 1400).

If possible is there a neat and readable way to do this?

you could use `sorted_users = sorted(users, key=lambda x: x.actual_rating)`? — Andrew, Oct 19 '20 at 09:43
see https://stackoverflow.com/questions/403421/how-to-sort-a-list-of-objects-based-on-an-attribute-of-the-objects — balderman, Oct 19 '20 at 09:43
I can't seem to figure out what function to use to get it sort based on probability. Is there perhaps a numpy function that does this? — Vivek Joshy, Oct 19 '20 at 09:44
It sounds like you want to map the decimal value to some kind of probability distribution and then sort based on where the value lies in the distribution? Or is it just that you want the sort to sometimes work and sometimes fail depending on some kind of probability? — Andrew, Oct 19 '20 at 09:45
@Andrew The list should be sorted somewhat randomly, not based purely on the values. So even 1400 might have a chance to be at index 0 once in a while though it would be unlikely. Something like what you said, but without doing a decimal map. The distribution is a binomial one I want. — Vivek Joshy, Oct 19 '20 at 09:47
@AbhinavMathur Higher likelihood, not 100% guarantee an element will be sorted at index 0 if there is a higher value. Please read the question carefully. — Vivek Joshy, Oct 19 '20 at 09:49
How about doing a first pass where you generate, for each user, a random number from a Gaussian distribution with mean `actual_rating`? Then you sort according to this random number instead of sorting according to `actual_rating` directly. — Stef, Oct 19 '20 at 09:53
Another possibility is to sort the list according to `actual_rating`, then shuffle it slightly. See for instance [How to lightly shuffle a list in python?](https://stackoverflow.com/questions/62436299/how-to-lightly-shuffle-a-list-in-python) — Stef, Oct 19 '20 at 09:57
@Stef that sound like something that would work. I guess the [numpy.random.normal](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html#numpy.random.normal) function works for this. Thanks for the help. If you answer the question I will mark correct. — Vivek Joshy, Oct 19 '20 at 09:58

Stef · Accepted Answer · 2020-10-19T12:55:26.440

Generating a random value for each user, then sorting according to this value

How about doing a first pass where you generate, for each user, a random number from a Gaussian distribution with mean actual_rating? Then you sort according to this random number instead of sorting according to actual_rating directly.

stddev = 1.0   # the larger this number, the more shuffled the list - the smaller, the more sorted the list

sorted_users = sorted(users, key=lambda u:np.random.normal(u.actual_rating, stddev))

Note the parameter stddev which you can adjust to suit your needs. The higher this parameter, the more shuffled the list in the end.

Sorting the list, then shuffling it lightly

Inspired by How to lightly shuffle a list in python?

Sort the list according to actual_rating, then shuffle it lightly.

sorted_users = sorted(users, key=lambda u:u.actual_rating)

nb_passes = 3
proba_swap = 0.25

for k in range(nb_passes):
  for i in range(k%2, len(sorted_users) - 1, 2):
    if random() < proba_swap:
      sorted_users[i], sorted_users[i+1] = sorted_users[i+1], sorted_users[i]

Note the two parameters nb_passes (positive integer) and proba_swap (between 0.0 and 1.0) which you can adjust to better suit your needs.

Instead of using a fixed parameter proba_swap, you could make up a formula for a probability of swapping that depends on how close the actual_ratings of the two users are, for instance def proba_swap(r1,r2): return math.exp(-a*(r1-r2)**2)/2.0 for some positive parameter a.

Or alternatively:

sorted_users = sorted(users, key=lambda u:u.actual_rating)

nb_swaps = int(1.5 * len(sorted_users))  # parameter to experiment with

for i in random.choices(range(len(sorted_users)-1), k=nb_swaps):
    sorted_users[i], sorted_users[i+1] = sorted_users[i+1], sorted_users[i]

How do I sort a list probabilistically based on properties of their elements?

1 Answers1

Generating a random value for each user, then sorting according to this value

Sorting the list, then shuffling it lightly

See also