Strange result of Roulette wheel selection algorithm in Python

Question

I wrote a function 'wheel_select' to pick a value from the fitness data, but my calculations are a little weird. where array A is randomly generated 1000 integers of size between 1-10 and array B is the real data for my experiments.

I used my function to pick 10,000 values from each of them and plotted them as a histogram. Array A has a good distribution of the data and seems to be in line with Roulette wheel algorithm, with larger values being selected more times, but array B has a bad situation and looks like an image of a normal distribution. What is the reason for this? Did I miss something about Roulette wheel selection algorithm?

As for Roulette wheel algorithm, I refer to @umutto 's answer in this post.

Roulette wheel selection with positive and negative fitness values for minimization

Here is my code: (For 10,000 calculations, it takes about 30 seconds on my i5 9600k PC)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

a = np.random.randint(10, size = 1000)
b = np.array([412.85, 497.12, 380.16, 360.26, 657.97, 574.37, 341.08, 453.31,
       174.62, 322.36, 463.86, 520.26, 377.5 , 621.98,   0.  , 215.84,
       241.29, 459.06, 606.71, 326.91, 371.6 , 212.65, 593.39, 174.58,
       127.98, 508.01, 552.58, 283.92, 428.15, 159.24, 200.55, 232.55,
       476.71, 333.31, 650.79, 557.69, 625.52, 524.93, 343.86, 493.17,
       715.89, 519.25, 445.77, 281.99, 141.07, 573.53, 241.02, 430.73,
       421.12, 335.5 , 519.73, 428.19, 528.79, 147.82, 444.06, 373.5 ,
       411.7 , 355.18, 484.27, 541.81, 235.9 , 193.61, 365.21, 247.15,
       459.5 , 583.71, 618.82, 409.82, 412.41, 249.95, 422.85, 223.32,
       477.81, 752.4 , 184.62, 348.39, 733.64, 611.86,  91.  , 170.51,
       269.95, 318.84, 377.64, 432.73, 480.98, 260.97, 610.75, 385.83,
       814.43, 239.29, 440.42, 158.47, 421.69, 314.9 , 557.44, 287.94,
       444.23, 337.68, 382.01, 511.87, 193.06, 266.82, 424.55, 416.06,
       595.11, 357.54, 628.87, 170.68, 235.05, 539.92, 613.35, 528.5 ,
       113.4 , 324.8 , 480.45, 863.06, 121.05, 454.45, 554.42, 512.74,
       457.91, 312.65, 435.87, 354.13, 602.9 , 508.37, 640.97, 294.35,
       301.01, 477.08, 120.39, 350.18, 419.27, 308.8 , 692.49, 428.23,
       591.79, 497.73, 448.09, 429.09, 435.18, 453.19, 329.44, 641.56,
       340.88, 550.01, 528.17, 240.84, 494.9 , 295.02, 464.3 , 573.53,
       545.83, 358.09, 240.35, 417.92, 546.16, 408.97,  11.57, 421.15,
       445.21, 421.17, 230.45, 420.45, 365.38, 648.83, 518.6 , 450.72,
       536.01, 236.48, 190.02, 448.4 , 621.07, 599.4 , 709.42, 147.75,
       583.97, 276.09, 557.82, 375.96, 261.28, 400.54, 413.2 , 381.02,
       282.16, 671.85, 165.54, 455.15, 414.97, 152.37, 197.5 , 247.32,
       593.1 , 456.37, 255.5 , 464.73, 567.02, 182.85, 119.55, 420.85])

def wheel_select(fit_data): #roulette wheel selection funtion
    df = pd.DataFrame(fit_data, columns = ['fitness'])
    df['probability'] = df.fitness / df.fitness.sum()
    df['prob_cumsum'] = df.probability.cumsum()
    rand_num = np.random.uniform(0, 1)
    idx = (round(rand_num, 3) < df.prob_cumsum).idxmax()
    return df.iloc[idx].fitness

res_a, res_b = [], []
for i in range(10000):
    res_a.append(wheel_select(a))
    res_b.append(wheel_select(b))

df = pd.DataFrame({'res_a':res_a, 'res_b':res_b})
df.hist(bins=100)
plt.show()

As for Roulette wheel algorithm, I refer to @umutto 's answer in this post. https://stackoverflow.com/questions/44430194/roulette-wheel-selection-with-positive-and-negative-fitness-values-for-minimizat — skywave1980, Oct 05 '22 at 01:32

Strange result of Roulette wheel selection algorithm in Python

0 Answers0