0

I'm trying to apply GA to solve a problem and having couple questions.

First question is about selection - I've seen in many implementations that population is sorted according to score/fitness prior selection. Is it really necessary? For example in Tournament selection individuals should be picked uniformly so sorting seems to have no meaning. Or lets consider Roulette Wheel selection. With the following implementation sorting again seems to be unnecessary (pseudo code):

totalFitness = sum individuals i.fitness
rouletteValue = totalFitness * random 0.0 1.0
selected = null
i = 0;
while rouletteValue >= 0.0
  selected = individuals[i++]
  rouletteValue -= selected.fitness
return selected

So does it really necessary to sort population for selection?

Another question is quick convergence: I'm trying out Simple GA with crossover probability 0.9, mutation probability 0.01, population size 30 and initial population contains 'very-good-solution' (known). Crossover produces two offsprings always, so one iteration is as follows (pseudo code):

for i = 0 to population-size
    mother = select-one population
    while father != mother
        father = select-one population
    if should-crossover crossover-probability
        (sister, brother) = apply-crossover mother father 
    else
        sister = copy mother 
        brother = copy father 
    if should-mutate mutation-probability
       apply-mutation sister
    if should-mutate mutation-probability
       apply-mutation brother
    insert-at new-population i sister
    insert-at new-population i+1 brother
    i = i + 2
swap new-population population

Also because of nature of the problem crossover almost never provides better results then parents had.

So, roughly speaking, what happens is because 'good solution' is much better than any other solution in initial population it is selected for reproduction more often then other individual (lets say with probability 0.1 for 30 individuals). If crossover is not applied (probability 0.1) then 'good solution' is carried over into new population. So as population size is 30 on average crossover is skipped for 1.5 pair. So after 15 iterations 22 crossovers are skipped. Now as 'good solutions' selection probability is 0.1 then after 22 skipped crossovers I'll have 2 'good solutions' inserted into new population. After this happens 'good solution' proliferates to fast.

Is there any way to overcome this problem?

koruyucu
  • 190
  • 1
  • 9
  • In the algorithm above, the individuals selected are numbered starting at position 0 in `individuals`. So if that set is sorted (most fit first) it will select a random number of the most fit individuals. If the set is unsoted (random order) it will select random number of random indivuals. – Evil Dog Pie Nov 12 '14 at 10:53
  • What is the problem with fast convergence on the most fit solution? – Evil Dog Pie Nov 12 '14 at 10:56
  • About sorting before selection - as we select numbers uniformly 0.05, 0.5 and 0.99 are equally possible, so if most fit individual is first one it does not mean it will be selected more frequently. What matters is that it occupies bigger range, so as we select multiple times it has more chances to get selected. For example if I have 3 individuals with probabilities 0.1, 0.2 and 0.7 it should matter which one goes first - after selecting, for example 10 random numbers, last one still will be selected more frequently. – koruyucu Nov 12 '14 at 13:25
  • Anyway, I think I've found answer to the question about sorting - http://stackoverflow.com/questions/10531565/how-should-roulette-wheel-selection-be-organized-for-non-sorted-population-in-g – koruyucu Nov 12 '14 at 13:25
  • As for convergence - it is bad because solution still can be improved and I expect to be improved sognificantly – koruyucu Nov 12 '14 at 13:26
  • Ok, I understand. In that case your selection algorithm is unclear, because starting with i = 0 and using individuals[i++] to obtain the specified individual selected implies that the first iteration will always select individuals[0], etc. I.e. the first one in the list. – Evil Dog Pie Nov 12 '14 at 13:46
  • Ah, I got you wrong then. You are right - it should be: selected = individual[i]; i++ – koruyucu Nov 12 '14 at 14:29
  • That's the same. The index `i` will always be 0 first iteration, 1 second, etc. picking the indivduals in sequence. The index needs to be randomised. Within the loop use `i = random 1 count individuals` (Assuming indexing is 1 based and random is inclusive of the upper and lower bounds and can produce integer values.) I'm just being pedantic now and don't actually have an answer to your question. – Evil Dog Pie Nov 12 '14 at 15:12

1 Answers1

0

On the convergence problem - judging from the parameters you described, you are converging to your seeded known-good solution because your population is nowhere near large enough.

Assuming that the remainder of your population is randomly generated, the odds that these 29 individuals will have any components that immediately generate a more-fit offspring is extremely low. So combined with the 10% no-crossover chance, the known-good individual will consistently dominate.

To generate a more-fit individual via GA than your known-good, you will need to significantly increase your population, and probably increase the crossover probability as well. You probably will also need to weaken the selection algorithm to prevent any single strong individual from dominating the reproduction cycle in any given generation.

akira
  • 161
  • 5