Genetic algorithm initial seed diversity using value encoding C#

Question

I would like to know the following: How to effectively make initial generation of chromosomes with high diversity using value encoding ? One way is grid initialization, but it is too slow.

Till now I have been using Random class from .NET for choosing random values in value encoding but, although values are uniformly distributed, fitness function values calculated from such chromosomes are not. Here is a code for Chromosome initialization:

 public Chromosome(Random rand) 
        {
            Alele = new List<double>();
            for (int i = 0; i < ChromosomeLength; i++)
            {
                Alele.Add(rand.NextDouble() * 2000 - 1000);
            }
        }

So, I developed a function that calculates fitness from new, randomly made chromosome (upper code) and if fitness is similar to any other already in the list of chromosomes, a new chromosome is made randomly and his fitness is calculated and this process is repeated until his fitness is not different enough from those already in the list.

Here is the code for this part:

private bool CheckSimilarFitnes(List<Chromosome> chromosome, Chromosome newCandidate) 
    {
     Boolean flag=false;
     double fitFromList, fitFromCandidate;
     double fitBigger,fitSmaller;

     foreach (var listElement in chromosome)
      {  
      fitFromList = listElement.CalculateChromosomeFitness(listElement.Alele);
      fitFromCandidate = newCandidate.CalculateChromosomeFitness(newCandidate.Alele);
      fitBigger = fitFromList >= fitFromCandidate ? fitFromList : fitFromCandidate;
      fitSmaller =  fitFromList < fitFromCandidate ? fitFromList : fitFromCandidate;

            if ((fitFromList / fitFromCandidate) < 1.5) 
                return false
      }

     else return true;

    }

But, the more chromosomes I have in the list it takes longer to add a new one, with fitness that is enough different from others already in there.

So, is there a way to make this grid initialization more faster, it takes days to make 80 chromosomes like this?

Q: I don't understand. Exactly what is wrong with the standard .Net "Random()"? Just the default seed (ticks)? Q: Does this link help: http://matthewlynch.net/dotnet/good-seed-for-random/ — paulsm4, Sep 21 '12 at 23:58
It is not the seed, problem is that I have to make chromosomes and then check their fitness. If it is too similar to ones already in the list (fitness) then repeat the process of creating a new chromosome until it's fitness id different enough. But this is taking to long, so my question is is there a quicker way of making diversity in inital seed? — Antun Tun, Sep 22 '12 at 00:04
Q: does this give you anything better: `Random random = new Random(Guid.NewGuid().GetHashCode());` — paulsm4, Sep 22 '12 at 00:22

score 2 · Answer 1 · 2012-09-22T05:20:35.867

2

The basic problem here is that most randomly generated chromosomes have similar fitness, right? That's fine; the idea isn't for your initial chromosomes to have wildly different fitnesses; it's for the chromosomes themselves to be different, and presumably they are. In fact, you should expect the initial fitness of most of your first generation to be close to zero, since you haven't run the algorithm yet.

Here's why your code is so slow. Let's say the first candidate is terrible, basically zero fitness. If the second one has to be 1.5x different, that really just means it has to be 1.5x better, since it can't really get worse. Then the next one has to 1.5x better than that, and so on up to 80. So what you're really doing is searching for increasingly better chromosomes by generating completely random ones and comparing them to what you have. I bet if you logged the progress, you'd find it takes more and more time to find the subsequent candidates, because really good chromosomes are hard to find. But finding better chromosomes is what the GA is for! Basically what you've done is optimize some of the chromosomes by hand before, um, actually optimizing them.

If you want to ensure that your chromosomes are diverse, compare their content, don't compare their fitness. Comparing the fitness is the algo's job.

edited Sep 22 '12 at 05:20

answered Sep 22 '12 at 01:02

+1. I like your explanation better than mine - more related to algorithm, compared to my "pigeon hole" one. – Alexei Levenkov Sep 22 '12 at 01:19
+1. This question is actually *very* broad, and I wonder if it even has an answer. Esentially the OP is asking how to write proper genetic algorithms. I'd suggest [AI Techniques for Game Programmers by Mat Buckland](http://books.google.com/books/about/AI_Techniques_for_Game_Programming.html?id=6F9nKQdj8hwC). It's a very accessible text with lots of information on Fitness Scaling, Mutation, and Pressure. (note: I'm in no way affiliated with the book. I simply found it extremely useful as an introductory text several years ago) – cod3monk3y Sep 22 '12 at 01:38
You are wright, my code is doing "by hand" optimization for starting or initial population. But you are wrong in one thing, randomly created chromosomes, which do have similar fitness can't create anything new in next generation (uniform crossover with masked used). So I need a way to make initial seed with more diversity. As it says in all literature of GA, if initial seed doesn't have enough diversity, algorithm converges too early. – Antun Tun Sep 22 '12 at 10:07
And by the way, I am using mutation rate of 5%, uniform crossover and probabilistic binary tournament. – Antun Tun Sep 22 '12 at 10:34
Here, have a look and you'll see http://www.mediafire.com/view/?fdjzkal66hezdzo – Antun Tun Sep 22 '12 at 10:44
1

@artun Diversity is important, because otherwise you end up at a local maxima. But not diversity of *fitness*. Fitness is a measurement of how "good" the chromosome is, not anything qualitative about it. Having diverse fitness won't help the GA at all, because its big job is to promote higher fitness. If your initial chromosomes range from unfit to fit, which do you think will get promoted by GA? It defeats the whole point. – Sep 22 '12 at 19:47
Look, for example, at @cod3monk3y's code in pastebin. It calculates the stddev by comparing alleles, not fitness. – Sep 22 '12 at 19:48
Ok, looks like I need to make random chromosomes with more variance. I posted my code ath the end of his post – Antun Tun Sep 22 '12 at 20:34
That may well be, hard to say. I'm only saying that comparing the fitnesses for diversity purposes confuses the issue and makes the task impossible. Reading the thread below with cod3monk3y, I think they've got you covered. – Sep 22 '12 at 21:00

cod3monk3y · Accepted Answer · 2012-09-23T19:53:01.857

here's some code that might help (which I just wrote): GA for ordering 10 values spaced by 1.0. It starts with a population of 100 completely random alleles, which is exactly how your code starts.

The goal I gave the GA to solve was to order the values in increasing order with a separation of 1.0. It does this in the fitness function Eval_OrderedDistance by by computing the standard deviation of each pair of samples from 1.0. As the fitness tends toward 0.0, the alleles should start to appear in sequential order.

Generation 0's fittest Chromosome was completely random, as were the rest of the Chromosomes. You can see the fitness value is very high (i.e., bad):

GEN: fitness   (allele, ...)
  0: 375.47460 (583.640, -4.215, -78.418, 164.228, -243.982, -250.237, 354.559, 374.306, 709.859, 115.323)

As the generations continue, the fitness (standard deviation from 1.0) decreases until it's nearly perfect in generation 100,000:

  100: 68.11683 (-154.818, -173.378, -170.846, -193.750, -198.722, -396.502, -464.710, -450.014, -422.194, -407.162)
  ...
10000:  6.01724 (-269.681, -267.947, -273.282, -281.582, -287.407, -293.622, -302.050, -307.582, -308.198, -308.648)
  ...
99999:  0.67262 (-294.746, -293.906, -293.114, -292.632, -292.596, -292.911, -292.808, -292.039, -291.112, -290.928)

The interesting parts of the code are the fitness function:

// try to pack the aleles together spaced apart by 1.0
// returns the standard deviation of the samples from 1.0
static float Eval_OrderedDistance(Chromosome c) {
    float sum = 0;
    int n = c.alele.Length;
    for(int i=1; i<n; i++) {
        float diff = (c.alele[i] - c.alele[i-1]) - 1.0f; 
        sum += diff*diff; // variance from 1.0
    }

    return (float)Math.Sqrt(sum/n);
}

And the mutations. I used a simple crossover and a "completely mutate one allele":

Chromosome ChangeOne(Chromosome c) {
    Chromosome d = c.Clone();
    int i = rand.Next() % d.alele.Length;
    d.alele[i] = (float)(rand.NextDouble()*2000-1000);
    return d;
}

I used elitism to always keep one exact copy of the best Chromosome. Then generated 100 new Chromosomes using mutation and crossover.

It really sounds like you're calculating the variance of the fitness, which does of course tell you that the fitnesses in your population are all about the same. I've found that it's very important how you define your fitness function. The more granular the fitness function, the more you can discriminate between your Chromosomes. Obviously, your fitness function is returning similar values for completely different chromosomes, since your gen 0 returns a fitness variance of 68e-19.

Can you share your fitness calculation? Or what problem you're asking the GA to solve? I think that might help us help you.

[Edit: Adding Explicit Fitness Sharing / Niching]

I rethought this a bit and updated my code. If you're trying to maintain unique chromosomes, you have to compare their content (as others have mentioned). One way to do this would be to compute the standard deviation between them. If it's less than some threshold, you can consider them the same. From class Chromosome:

// compute the population standard deviation
public float StdDev(Chromosome other) {
    float sum = 0.0f;
    for(int i=0; i<alele.Length; i++) {
        float diff = other.alele[i] - alele[i];
        sum += diff*diff;
    }
    return (float)Math.Sqrt(sum);
}

I think Niching will give you what you'd like. It compares all the Chromosomes in the population to determine their similarity and assigns a "niche" value to each. The chromosomes are then "penalized" for belonging to a niche using a technique called Explicit Fitness Sharing. The fitness values are divided by the number of Chromosomes in each niche. So if you have three in niche group A (A,A,A) instead of that niche being 3 times as likely to be chosen, it's treated as a single entity.

I compared my sample with Explicit Fitness Sharing on and off. With a max STDDEV of 500 and Niching turned OFF, there were about 18-20 niches (so basically 5 duplicates of each item in a 100 population). With Niching turned ON, there were about 85 niches. Thats 85% unique Chromosomes in the population. In the output of my test, you can see the diversity after 17000 generations.

Here's the niching code:

// returns: total number of niches in this population
// max_stddev -- any two chromosomes with population stddev less than this max
//               will be grouped together
int ComputeNiches(float max_stddev) {
    List<int> niches = new List<int>();

    // clear niches
    foreach(var c in population) {
        c.niche = -1;
    }

    // calculate niches
    for(int i=0; i<population.Count; i++) {
        var c = population[i];
        if( c.niche != -1) continue; // niche already set

        // compute the niche by finding the stddev between the two chromosomes 
        c.niche = niches.Count;
        int count_in_niche = 1; // includes the curent Chromosome
        for(int j=i+1; j<population.Count; j++) {
            var d = population[j];
            float stddev = c.StdDev(d);
            if(stddev < max_stddev) {
                d.niche = c.niche; // same niche
                ++count_in_niche;
            }
        }
        niches.Add(count_in_niche);
    }

    // penalize Chromosomes by their niche size
    foreach(var c in population) {
        c.niche_scaled_fitness = c.scaled_fitness / niches[c.niche];
    }

    return niches.Count;
}

[Edit: post-analysis and update of Anton's code]

I know this probably isn't the right forum to address homework problems, but since I did the effort before knowing this, and I had a lot of fun doing it, I figure it can only be helpful to Anton.

Genotip.cs, Kromosom.cs, KromoMain.cs

This code maintains good diversity, and I was able in one run to get the "raw fitness" down to 47, which is in your case the average squared error. That was pretty close!

As noted in my comment, I'd like to try to help you in your programming, not just help you with your homework. Please read these analysis of your work.

As we expected, there was no need to make a "more diverse" population from the start. Just generate some completely random Kromosomes.
Your mutations and crossovers were highly destructive, and you only had a few of them. I added several new operators that seem to work better for this problem.
You were throwing away the best solution. When I got your code running with only Tournament Selection, there would be one Kromo that was 99% better than all the rest. With tournament selection, that best value was very likely to be forgotten. I added a bit of "elitism" which keeps a copy of that value for the next generation.
Consider object oriented techniques. Compare the re-write I sent you with my original code.
Don't duplicate code. You had the sampling parameters in two different classes.
Keep your code clean. There were several unused parts of code. Especially when submitting questions to SO, try to narrow it down, remove unused code, and do some cleaning up.
Comment your code! I've commented the re-work significantly. I know it's Serbian, but even a few comments will help someone else understand what you are doing and what you intended to do.
Overall, nice job implementing some of the more sophisticated things like Tournament Selection
Prefer double[] arrays instead of List. There's less overhead. Also, several of your List temp variables weren't even needed. Your structure

List temp = new List(); for(...) { temp.add(value); } for(each value in temp) { sum += value } average = sum / temp.Count

can easily be written as:

sum = 0
for(...) {
    sum += value;
}
average = sum / count;

In several places you forgot to initialize a loop variable, which could have easily added to your problem. Something like this will cause serious problems, and it was in your fitness code along with one or two other places

double fit = 0; for(each chromosome) { // YOU SHOULD INITIALIZE fit HERE inside the LOOP for(each allele) { fit += ...; } fit /= count; }

Good luck programming!

Updated with niching, which is what it sounds like you're trying to do. Cheers. — cod3monk3y, Sep 22 '12 at 16:37
+1 lots of good stuff in here. I think the OP is still struggling with what should tackled in the initial population vs what goes into the algorithm itself. — , Sep 22 '12 at 19:49
@cod3monk3y - Thank you very much, so what do you suggest, can you look at my source code? http://pastebin.com/D4RvKMHq — Antun Tun, Sep 23 '12 at 10:34
where will I put my fitness function in your code to see is this working? Can you take a look, please. Thank you — Antun Tun, Sep 23 '12 at 11:11
Can you just tell me is inverz least squares method a good fitnes function for polynomial coefficient calculating? You hace it in my source code under the name RacunajFunkcijuFitnesa(). — Antun Tun, Sep 23 '12 at 11:39
Hi Antun. I have modified your source code and have it working (to some degree) after several modifications and a few serious bug fixes. I'll post it tomorrow (in about 4 or so hours). — cod3monk3y, Sep 23 '12 at 11:42
Thank you very much, how can I thank you, is there a way for your kindness, it was for my homework, I was struggling for 7 days with premature convergance but know I have learned. — Antun Tun, Sep 23 '12 at 11:49
If you need I can give you an email, your StdDev can be made faster : in return statement divide argument by length of allele, in converges then 10 times faster :-) — Antun Tun, Sep 23 '12 at 16:08
Hi Antun. I'm glad we've helped you. I'm not certain my position on [addressing homework problems](http://meta.stackexchange.com/questions/10811/how-to-ask-and-answer-homework-questions) on SO, or what the community really allows. I think it would have been better to state up front that this was a homework problem. Ultimately, I found it pretty enjoyable and it's something I've actually wanted to work on for a while. I'll edit my answer with an explanation which hopefully you can learn from, from what I learned from your code. Hopefully it will help you in the future! — cod3monk3y, Sep 23 '12 at 19:33
Thank you once more, I never intended to make you do my homework, as you saw, most of the things were done but not efficiently. Thank you once again, god bless you! — Antun Tun, Sep 23 '12 at 21:05
How much iterations does it take, approximately, mine is stack at the same value for almost 500 itrations? — Antun Tun, Sep 24 '12 at 07:47
Just one more QUESTION. "Where did you learn from about genetic algorithms?" Thank you, links would be appriciated. — Antun Tun, Sep 24 '12 at 09:38
This book is a _great_ starting point: [AI Techniques for Game Programmers](http://books.google.com/books/about/AI_Techniques_for_Game_Programming.html?id=6F9nKQdj8hwC). The iteration count will be high, around 100,000. On my machine I get about 100-200 generations a second, which could probably be improved. — cod3monk3y, Sep 24 '12 at 14:08
How many chromosomes, I just went through 300 000 count with 800 chromosomes and no conversion? — Antun Tun, Sep 24 '12 at 18:11
I used between 80 and 200 chromosomes. I haven't __perfected__ the algorithm, but basically you can look at the file output (from my modification of your source) to see each generation. Use that information in combination with adding different mutation and crossover operations to help improve your algorithm. The main point, basically to answer your intial question, is that you should be using the mutation, crossover, and fitness scaling to control your diversity. And that for diversity you need to compare the alleles, not the fitnesses. — cod3monk3y, Sep 24 '12 at 18:27
So when diversity is below some level I increase mutation, what about crossover, the same for crossover? And to how much should I increase mutation? Thank you and sorry for being pain in the ass — Antun Tun, Sep 24 '12 at 18:38

score 1 · Answer 3 · edited May 23 '17 at 12:04

I'm going to take a quick swing at this, but Isaac's pretty much right. You need to let the GA do its job. You have a generation of individuals (chromosomes, whatever), and they're all over the scale on fitness (or maybe they're all identical).

You pick some good ones to mutate (by themselves) and crossover (with each other). You maybe use the top 10% to generate another full population and throw out the bottom 90%. Maybe you always keep the top guy around (Elitism).

You iterate at this for a while until your GA stops improving because the individuals are all very much alike. You've ended up with very little diversity in your population.

What might help you is to 1) make your mutations more effective, 2) find a better way to select individuals to mutate. In my comment I recommended AI Techniques for Game Programmers. It's a great book. Very easy to read.

To list a few headings from the book, the things you're looking for are:

Selection techniques like Roulette Selection (on stackoveflow) (on wikipedia) and Stochastic Universal Sampling, which control how you select your individuals. I've always liked Roulette Selection. You set the probabilities that an individual will be selected. It's not just simple white-noise random sampling.

I used this outside of GA for selecting 4 letters from the Roman alphabet randomly. I assigned a value from 0.0 to 1.0 to each letter. Every time the user (child) would pick the letter correctly, I would lower that value by, say 0.1. This would increase the likelihood that the other letters would be selected. If after 10 times, the user picked the correct letter, the value would be 0.0, and there would be (almost) no chance that letter would be presented again.

Fitness Scaling techniques like Rank Scaling, Sigma Scaling, and Boltzmann Scaling (pdf on ftp!!!) that let you modify your raw fitness values to come up with adjusted fitness values. Some of these are dynamic, like Boltzmann Scaling, which allows you to set a "pressure" or "temperature" that changes over time. Increased "pressure" means that fitter individuals are selected. Decreased pressure means that any individual in the population can be selected.

I think of it this way: you're searching through multi-dimensional space for a solution. You hit a "peak" and work your way up into it. The pressure to be fit is very high. You snug right into that local maxima. Now your fitness can't change. Your mutations aren't getting you out of the peak. So you start to reduce the pressure and just, oh, select items randomly. Your fitness levels start to drop, which is okay for a while. Then you start to increase the pressure again, and surprise! You've skipped out of the local maxima and found a lovely new local maxima to climb into. Increase the pressure again!

Niching (which I've never used, but appears to be a way to group similar individuals together). Say you have two pretty good individuals, but they're wildly different. They keep getting selected. They keep mutating slightly, and not getting much better. Now you have half your population as minor variants of A, and half your population minor variants of B. This seems like a way to say, hey, what's the average fitness of that entire group A? and what for B? And what for every other niche you have. Then do your selection based on the average fitness for each niche. Pick your niche, then select a random individual from that niche. Maybe I'll start using this after all. I like it!

Hope you find some of that helpful!

Thank you for you links, I will sure have a look, but that was not my problem. I have to find out how to make initial seed with more diversity. The way that I am doing it now is showen in first post and is too slow, is there anothery way? — Antun Tun, Sep 22 '12 at 10:12
The diversity in the literature you're referring to likely means diversity in the correlation between individuals. I think you're conflating fitness with diversity. In the photo you posted, the correlation between the first two Aleles is 0.1, which is extremely low (i.e. high diversity). Can you post your fitness code? How many Chromosomes do you have? How many generations? What generation is that image from? Can you post a screen shot after 100 generations? — cod3monk3y, Sep 22 '12 at 12:06
here you go:80 chromosome - 1. generation variance = 68.405597402411193E-19 2. generation variance = 3.3390292462201335E-19 after 100 generations = 0.0000000000000000013084417866801596 after 10 000 generations fitness = 7.4569107832092211E-19 after 100 000 = 6.6447231810762E-19 — Antun Tun, Sep 22 '12 at 13:08
That's the variance of the fitnesses? I'll need to see your fitness calculation in order to help you any further. I've posted some sample code (2nd answer) that might help you, which solves a problem using your setup without making any extra effort to ensure a new chromosome is unique. — cod3monk3y, Sep 22 '12 at 14:24
HEre is my whole source code: http://pastebin.com/D4RvKMHq I would like to thank you for your effort, if we solve this I'll give you my green thick mark :-) Function for the fitnes is called: RacunajFunkcijuFitnesa(); — Antun Tun, Sep 22 '12 at 19:50

score 0 · Answer 4 · answered Sep 22 '12 at 00:31

0

If you need true random numbers for your application, I recommend you check out Random.org. They have a free HTTP API, and clients for just about every language.

The randomness comes from atmospheric noise, which for many purposes is better than the pseudo-random number algorithms typically used in computer programs.

(I am unaffiliated with Random.org, although I did contribute the PHP client).

answered Sep 22 '12 at 00:31

Jonathon Reinhart

132,704
33
254
328

I don't think the OP is complaining that his numbers aren't adequately random. – Sep 22 '12 at 01:08
Yes, Isaac you are wright, my numbers do have required diversity in value but not in fitness. But thank you for you contribution – Antun Tun Sep 22 '12 at 10:13

score 0 · Answer 5 · answered Sep 22 '12 at 01:09

I think your problem is in how your fitness function and how you select candidates, not in how random values are. Your filtering feels too strict that may not even allow enough elements to be accepted.

Sample

values: random float 0-10000.
fitness function square root(n)
desired distribution of fitness - linear with distance at least 1.

With this fitness function you will quickly get most of the 1-wide "spots" taken (as you have at most 100 places), so every next one will take longer. At some point there will be several tiny ranges left and most of the results will simply rejected, even worse after you get about 50 numbers places there is a good chance that next one simply will not be able to fit.

Yes, you are wright, so what do you suggest, how to calculate diversity different? — Antun Tun, Sep 22 '12 at 10:17

Genetic algorithm initial seed diversity using value encoding C#

5 Answers5