Choose a random sample from a list with bias in Python

Question

To give a little background, I am coding a genetic algorithm to solve the Traveling Salesman Problem (TSP). In my population, I have an ordered list of paths from shortest to longest (fittest to least fit), and their respective distances, like this:

population = [
[[a, b, c, d], [10.12]],
[[b, c, a, d], [11.33]],
[[d, a, c, b], [11.5]],
[[b, a, d, c], [12.07]]
...]

After the population is ordered by their fitness, I need to kill half of them randomly, but in such a fashion that the fitter the member, the better their chance of surviving.

I've tried using random.choices() and passing a list with chances of probability (bias) into the weights parameter, and my desired size of half the original population as k like this:
```
# returns something like [0.99, 0.75, 0.65, 0.22...]
bias_weights = [x / len(population) for x in range(len(population))]

random.choices(population, weights=bias_weights, k=len(population) / 2)
```
The problem with the code above is that it produces duplicates in my list, and it's very messy to get rid of them and keep the population size at 50%.
I've also tried using the np.random.choices() from the numpy library, but it requires the list that I am passing to be 1D, and the list of weights and biases to add up to 1.

Is there any other way to do this?

LarrySnyder610 · Accepted Answer · 2018-05-21T15:30:21.730

2

I would still use np.random.choice(). Solve the first problem by asking np.random.choice() to choose the index of the path rather than the path itself. Solve the second problem by scaling the weights so they sum to 1.

import numpy as np

a, b, c, d = 1, 2, 3, 4

population = [
[[a, b, c, d], [10.12]],
[[b, c, a, d], [11.33]],
[[d, a, c, b], [11.5]],
[[b, a, d, c], [12.07]]
]

# Build probability array
bias_weights = [x / len(population) for x in range(len(population))]
prob = np.array(bias_weights) / np.sum(bias_weights)

# Get random integers between 0 and len(prob)-1, drawn according to prob
sample_size = 2
choice_indices = np.random.choice(len(prob), size=sample_size, replace=False, p=prob)

# Get corresponding paths
paths = [population[i][0] for i in choice_indices]

edited May 21 '18 at 15:30

answered May 20 '18 at 18:06

LarrySnyder610

2,277
12
24

How would I be able to scale this for 50% of the original data? The problem with the code above is that the same element could be chosen twice, which I do not want. Also, if I were to get rid of them in some kind of loop, I would also have to change the size of the `prob` list according to which element has been taken out. very messy. – Sergey Ronin May 20 '18 at 19:35
@KomronAripov You can still use [`choice()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.choice.html). I updated my code above to allow an arbitrary sample size. – LarrySnyder610 May 21 '18 at 15:32
this produces the following error: `ValueError: Fewer non-zero entries in p than size` – Sergey Ronin Mar 27 '19 at 15:56

Colin Hancey · Answer 2 · 2018-05-19T20:25:26.050

1

EDIT: Actually, I would recommend to just use the following:

while <variable> not in <list>:
    <list>.append(<variable>)

edited May 19 '18 at 20:25

answered May 19 '18 at 19:52

Colin Hancey

219
2
8

Could you possibly provide code along with your answer? I'm struggling on the technical side of things with Python. – Sergey Ronin May 19 '18 at 19:57
This does not seem like a good solution as the number of elements `[a, b, c and d]` can change any time (that is the nature of the program) and I would not be able to add `for` clauses dynamically. – Sergey Ronin May 19 '18 at 20:15
It's actually fairly easy to handle a dynamic list size with "for item in all_stuff:" with "battle_royale.append(item)", but the number of times each item should be appended to the battle_royale list requires a more intricate approach. It could be accomplished with classes, for example. Anyway, I just realized my suggestion is way overly complicated. What you have is good, just use "while ___ not in ___:" – Colin Hancey May 19 '18 at 20:22

score 1 · Answer 3 · answered May 19 '18 at 20:10

Choose one element at a time, put it into a set to guarantee it's unique, and continue until you have enough elements:

bias_weights = [x / len(population) for x in range(len(population))]

chosen = set()

while size(chosen) < len(population) // 2:
    chosen.add(random.choices(population, weights=bias_weights, k=1))

Severin Pappadeux · Answer 4 · 2018-05-20T13:57:43.863

For no duplicates you have to use random shuffle. Algorithm is called weighted random shuffle, and is solved in

http://nicky.vanforeest.com/probability/weightedRandomShuffling/weighted.html

C++ version is right here

C++. Weighted std::shuffle

UPDATE: Fast weighted random shuffle copied from first link verbatim

from random import random
from bisect import bisect_right
import numpy as np

def weighted_shuffle(a,w):
    r = np.empty_like(a)
    cumWeights = np.cumsum(w)
    for i in range(len(a)):
         rnd = random() * cumWeights[-1]
         j = bisect_right(cumWeights,rnd)
         #j = np.searchsorted(cumWeights, rnd, side='right')
         r[i]=a[j]
         cumWeights[j:] -= w[j]
    return r

a = np.arange(1,1000)
w = np.arange(1,1000)
r = weighted_shuffle(a,w)
print(r[:2])

@KomronAripov did you check first link? It is Python as far as I can see — Severin Pappadeux, May 20 '18 at 13:56

Choose a random sample from a list with bias in Python

4 Answers4