0

Consider this sample of User objects:

import numpy as np


class User:
    def __init__(self, name, rating, actual_rating):
        self.name: str = name
        self.rating: int = rating

        # Actual States
        self.actual_rating: int = actual_rating

users = []
for actual_rating in np.random.binomial(10000, 0.157, 1000):
    users.append(
        User(str(random()), 1500, actual_rating)
    )


# Sorting Users Randomly
sorted_users = []

How do I sort this users list such that the likelihood that an object is lower in index in the sorted_users depends on actual_rating being higher. For instance a random User("0.5465454", 1500, 1678) will have a higher likelihood of being sorted to be at index 0 of the sorted_users list than say User("0.7689989", 1500, 1400).

If possible is there a neat and readable way to do this?

Vivek Joshy
  • 974
  • 14
  • 37
  • 2
    you could use `sorted_users = sorted(users, key=lambda x: x.actual_rating)`? – Andrew Oct 19 '20 at 09:43
  • 1
    see https://stackoverflow.com/questions/403421/how-to-sort-a-list-of-objects-based-on-an-attribute-of-the-objects – balderman Oct 19 '20 at 09:43
  • I can't seem to figure out what function to use to get it sort based on probability. Is there perhaps a numpy function that does this? – Vivek Joshy Oct 19 '20 at 09:44
  • It sounds like you want to map the decimal value to some kind of probability distribution and then sort based on where the value lies in the distribution? Or is it just that you want the sort to sometimes work and sometimes fail depending on some kind of probability? – Andrew Oct 19 '20 at 09:45
  • @Andrew The list should be sorted somewhat randomly, not based purely on the values. So even 1400 might have a chance to be at index 0 once in a while though it would be unlikely. Something like what you said, but without doing a decimal map. The distribution is a binomial one I want. – Vivek Joshy Oct 19 '20 at 09:47
  • @AbhinavMathur Higher likelihood, not 100% guarantee an element will be sorted at index 0 if there is a higher value. Please read the question carefully. – Vivek Joshy Oct 19 '20 at 09:49
  • 2
    How about doing a first pass where you generate, for each user, a random number from a Gaussian distribution with mean `actual_rating`? Then you sort according to this random number instead of sorting according to `actual_rating` directly. – Stef Oct 19 '20 at 09:53
  • Another possibility is to sort the list according to `actual_rating`, then shuffle it slightly. See for instance [How to lightly shuffle a list in python?](https://stackoverflow.com/questions/62436299/how-to-lightly-shuffle-a-list-in-python) – Stef Oct 19 '20 at 09:57
  • @Stef that sound like something that would work. I guess the [numpy.random.normal](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html#numpy.random.normal) function works for this. Thanks for the help. If you answer the question I will mark correct. – Vivek Joshy Oct 19 '20 at 09:58

1 Answers1

2

Generating a random value for each user, then sorting according to this value

How about doing a first pass where you generate, for each user, a random number from a Gaussian distribution with mean actual_rating? Then you sort according to this random number instead of sorting according to actual_rating directly.

stddev = 1.0   # the larger this number, the more shuffled the list - the smaller, the more sorted the list

sorted_users = sorted(users, key=lambda u:np.random.normal(u.actual_rating, stddev))

Note the parameter stddev which you can adjust to suit your needs. The higher this parameter, the more shuffled the list in the end.

Sorting the list, then shuffling it lightly

Inspired by How to lightly shuffle a list in python?

Sort the list according to actual_rating, then shuffle it lightly.

sorted_users = sorted(users, key=lambda u:u.actual_rating)

nb_passes = 3
proba_swap = 0.25

for k in range(nb_passes):
  for i in range(k%2, len(sorted_users) - 1, 2):
    if random() < proba_swap:
      sorted_users[i], sorted_users[i+1] = sorted_users[i+1], sorted_users[i]

Note the two parameters nb_passes (positive integer) and proba_swap (between 0.0 and 1.0) which you can adjust to better suit your needs.

Instead of using a fixed parameter proba_swap, you could make up a formula for a probability of swapping that depends on how close the actual_ratings of the two users are, for instance def proba_swap(r1,r2): return math.exp(-a*(r1-r2)**2)/2.0 for some positive parameter a.

Or alternatively:

sorted_users = sorted(users, key=lambda u:u.actual_rating)

nb_swaps = int(1.5 * len(sorted_users))  # parameter to experiment with

for i in random.choices(range(len(sorted_users)-1), k=nb_swaps):
    sorted_users[i], sorted_users[i+1] = sorted_users[i+1], sorted_users[i]

See also

After searching a little bit, I found this similar question:

Stef
  • 13,242
  • 2
  • 17
  • 28
  • @superbrain Thanks! I actually searched the doc of sort/sorted before I wrote my comment, but couldn't find this information. Now that I think about it, I should have tested sorting a small list with key `def k(x): print(x); return x`. – Stef Oct 19 '20 at 11:45
  • @superbrain Thanks, I edited to take your critics into account. – Stef Oct 19 '20 at 12:34
  • Very nice fix for the shuffling! – superb rain Oct 19 '20 at 12:38
  • Hmm, although if for example nb_passes is 1 and the list length is odd, the last element has no chance to move. So I think it's better than before, but still a bit unfair. – superb rain Oct 19 '20 at 12:44
  • @superbrain Yes, I guess it would make more sense to randomly select the elements that are swapped using `random.sample(range(len(sorted_users)-1), nb_swaps)`. – Stef Oct 19 '20 at 12:50
  • Too late to edit the comment but it should be random.choices, not random.sample – Stef Oct 19 '20 at 12:56