1

I have a list as follows:

i = [
        {'id': '1', 'P': 0.5},
        {'id': '2', 'P': 0.4},
        {'id': '3', 'P': 0.8},
        ...
        {'id': 'x', 'P': P(x)}
    ]

When I do the following:

i.sort(key=lambda x: x['P'], reverse=True)

The list gets sorted based on P where the element with the largest P is in front. but what if I want to make it seem randomized so that even a element with a small P value (with very small probability) can be the first item in the list?
Is it possible to implement that using the sort() function or do I have to write my own?

monoshiro
  • 78
  • 7
  • What is the range for P (min and max values)? – ayhan Jul 19 '16 at 10:06
  • 3
    If it's random, it's not really a sort! – jonrsharpe Jul 19 '16 at 10:07
  • 1
    Would `i.sort(key=lambda x: random.random()/x['P'])` do what you want? It gives the values with small `P` a wider range of random keys, so they'll usually appear later in the list than values with larger `P`. But any value can always get lucky with a very small result from `random.random()` and appear at the front of the list. – Blckknght Jul 19 '16 at 10:14
  • @ayhan P values are between 0 and 1 – monoshiro Jul 19 '16 at 10:28
  • @Blckknght thanks I'll give it a go – monoshiro Jul 19 '16 at 10:31
  • 1
    The problem seems underdetermined. Given two items -- in terms of the `'P'`-values, what should the probability be that one of the items is listed before the other? – John Coleman Jul 19 '16 at 10:36
  • @John Coleman then order should not matter. The P value are actually cosine similarity scores of documents after TFIDF weighting. I just want the documents to be ordered in a way that is not the same everytime but still preserve some relevance – monoshiro Jul 19 '16 at 10:47
  • You could always sort the list and then randomly mutate it. – John Coleman Jul 19 '16 at 11:11
  • Why don't you sort based on the P-score +/- a small bias sampled from a standard normal distribution? This way the bias will be close to zero most of the time, but with some chance of messing up the order slightly. You can choose the standard deviation such that you get the degree of 'randomness' you are after. – Andrew Guy Jul 19 '16 at 11:57

3 Answers3

4

As I alluded to in a comment, you could sort with a level of random bias by sampling a bias factor from a standard normal distribution. You can then add this bias (which is symmetrical either side of zero) to your P value.

import numpy as np

#Give some control over the level of rearrangement - larger equals more rearrangement
bias_factor = 0.5

i.sort(key=lambda x: x['P'] + np.random.randn()*bias_factor, reverse=True)

Or if you just want to use the standard library:

from random import gauss

#Give some control over the level of rearrangement - larger equals more rearrangement
sigma = 0.5

i.sort(key=lambda x: x['P'] + gauss(0, sigma), reverse=True)
Andrew Guy
  • 9,310
  • 3
  • 28
  • 40
3

According to my understanding, you want to have the list sorted but still want some some randomness. A list cannot be sorted as well as random at the same time. The nearest can be achieved via something like

i.sort(key=lambda x: x['P']*random.triangular(0.6, 1.4), reverse=True)

This while ensuring that the order is not exactly random as well as not sorted in a similar way.

The values 0.6 and 1.4 can be changed depending on the deviation you want.

Siddharth Gupta
  • 1,573
  • 9
  • 24
2

Here is my take on it: Suppose that item 1 has similarity score 0.3 and item 2 has similarity score 0.4. Then, it seems reasonable that when choosing between these two items, item 1 should be before item 2 with probability 0.3 / (0.3 + 0.4). This way the higher-scored item is usually before the lower-scored. The following functions implement a variant of merge-sort which incorporates this idea:

import random

def pick(a,b,key = None):
    ka = key(a) if key else a
    kb = key(b) if key else b
    if ka+kb == 0:
        return random.int(0,1)
    else:
        p = ka/(ka+kb)
        return 0 if random.random() <= p else 1

def randMerge(xs,ys,key = None):
    merged = []
    i = j = 0
    while i < len(xs) and j < len(ys):
        k = pick(xs[i],ys[j],key)
        if k == 0:
            merged.append(xs[i])
            i += 1
        else:
            merged.append(ys[j])
            j += 1
    if i == len(xs):
        merged.extend(ys[j:])
    else:
        merged.extend(xs[i:])
    return merged

def randSort(items,key = None):
    if len(items) < 2:
        return items
    else:
        n = len(items)//2
        xs = items[:n]
        ys = items[n:]
        return randMerge(randSort(xs,key),randSort(ys,key),key)

For testing:

i = [
        {'id': '1', 'P': 0.5},
        {'id': '2', 'P': 0.4},
        {'id': '3', 'P': 0.8},
        {'id': '4', 'P': 0.9}
    ]

For example:

>>> for j in range(5): print(randSort(i,key = lambda x: x['P']))

[{'P': 0.5, 'id': '1'}, {'P': 0.9, 'id': '4'}, {'P': 0.8, 'id': '3'}, {'P': 0.4, 'id': '2'}]
[{'P': 0.8, 'id': '3'}, {'P': 0.5, 'id': '1'}, {'P': 0.9, 'id': '4'}, {'P': 0.4, 'id': '2'}]
[{'P': 0.9, 'id': '4'}, {'P': 0.5, 'id': '1'}, {'P': 0.8, 'id': '3'}, {'P': 0.4, 'id': '2'}]
[{'P': 0.5, 'id': '1'}, {'P': 0.8, 'id': '3'}, {'P': 0.9, 'id': '4'}, {'P': 0.4, 'id': '2'}]
[{'P': 0.8, 'id': '3'}, {'P': 0.4, 'id': '2'}, {'P': 0.9, 'id': '4'}, {'P': 0.5, 'id': '1'}]
John Coleman
  • 51,337
  • 7
  • 54
  • 119