How to make virtual organisms learn using neural networks?

Question

I'm making a simple learning simulation, where there are multiple organisms on screen. They're supposed to learn how to eat, using their simple neural networks. They have 4 neurons, and each neuron activates movement in one direction (it's a 2D plane viewed from the bird's perspective, so there are only four directions, thus, four outputs are required). Their only input are four "eyes". Only one eye can be active at the time, and it basically serves as a pointer to the nearest object (either a green food block, or another organism).

Thus, the network can be imagined like this: enter image description here

And an organism looks like this (both in theory and the actual simulation, where they really are red blocks with their eyes around them):

enter image description here

And this is how it all looks (this is an old version, where eyes still didn't work, but it's similar):

enter image description here

Now that I have described my general idea, let me get to the heart of the problem...

Initialization| First, I create some organisms and food. Then, all the 16 weights in their neural networks are set to random values, like this: weight = random.random()threshold2. Threshold is a global value that describes how much input each neuron needs to get in order to activate ("fire"). It is usually set to 1.
Learning| By default, the weights in the neural networks are lowered by 1% each step. But, if some organism actually manages to eat something, the connection between the last active input and output is strengthened.

But, there is a big problem. I think that this isn't a good approach, because they don't actually learn anything! Only those that had their initial weights randomly set to be beneficial will get a chance of eating something, and then only them will have their weights strengthened! What about those that had their connections set up badly? They'll just die, not learn.

How do I avoid this? The only solution that comes to mind is to randomly increase/decrease the weights, so that eventually, someone will get the right configuration, and eat something by chance. But I find this solution to be very crude and ugly. Do you have any ideas?

This sounds similar to artificial life: http://en.wikipedia.org/wiki/Artificial_life However, there the focus is on the evolution of the organisms, i.e. they reproduce and thus the more viable survive. Is this somthing you might want to do? — Mika Fischer, Jan 25 '12 at 17:30
The solution sounds crude and ugly, but lends itself to unexpected solutions. Try increasing the weight variation, and let evolution run it's course ;) Yes, the individuals aren't learning, but the 'species' is. — Joel Cornett, Jan 25 '12 at 17:42
To add to @JoelCornett's comment: You may also need to increase the population size and run for longer. Also of interest in connection with this: http://en.wikipedia.org/wiki/Baldwin_effect — Josh Bleecher Snyder, Jan 25 '12 at 18:13

score 10 · Answer 1 · answered Jan 25 '12 at 17:44

As mentioned by Mika Fischer, this sounds similar to artificial life problems, so that's one avenue you could look at.

It also sounds a bit like you're trying to reinvent Reinforcement Learning. I would recommend reading through Reinforcement Learning: An Introduction, which is freely available in HTML form at that website, or purchasable in dead tree format. Example code and solutions are also provided on that page.

Use of neural networks (and other function approximators) and planning techniques is discussed later in the book, so don't get discouraged if the initial stuff seems too basic or non-applicable to your problem.

tom10 · Accepted Answer · 2012-01-25T18:00:48.100

10

This is similar to issues with trying to find a global minimum, where it's easy to get stuck in a local minimum. Consider trying to find the global minimum for the profile below: you place the ball in different places and follow it as it rolls down the hill to the minimum, but depending on where you place it, you may get stuck in a local dip. enter image description here

That is, in complicated situations, you can't always get to the best solution from all starting points using small optimizing increments. The general solutions to this are to fluctuate the parameters (i.e., weights, in this case) more vigorously (and usually reduce the size of the fluctuations as you progress the simulation -- like in simulated annealing), or just realize that a bunch of the starting points aren't going to go anywhere interesting.

edited Jan 25 '12 at 18:00

answered Jan 25 '12 at 17:53

tom10

67,082
10
127
137

Then it seems like randomly increasing/decreasing the weights is the best solution. Is that what you're trying to say? – corazza Jan 25 '12 at 18:01
It depends. Just letting the ones with bad starting parameters die off is easiest; but if that won't work well, as might be the case if your system is highly interacting, then randomly changing the weights can be a faster and more robust approach, but it's a bit more difficult because you need to dial the amount of weight change as the simulation or individual animal progresses (*e.g.*, since you don't want to constantly change the weights of your winners because then they can never settle to something good). – tom10 Jan 25 '12 at 18:12
Well, just letting the bad ones die isn't a solution. This implies that there is no learning, because only the ones that randomly got beneficial weights will survive, not those that got them trough a learning algorithm! Yes, I've figure out what to do! Please check the edit I made to the suggestion. – corazza Jan 25 '12 at 21:31

score 6 · Answer 3 · answered Jan 25 '12 at 18:03

6

How do you want it to learn? You don't like the fact that randomly seeded organisms either die off or prosper, but the only time you provide feedback into your organism is if they randomly get food.

Let's model this as hot and cold. Currently, everything feeds back "cold" except when the organism is right on top of food. So the only opportunity to learn is accidentally running over food. You can tighten this loop to provide more continuous feedback if you desire. Feedback warmer if there is movement toward food, cold if moving away.

Now, the downside of this is that there is no input for anything else. You've only got a food-seeker learning technique. If you want your organisms to find a balance between hunger and something else (say, overcrowding avoidance, mating, etc), the whole mechanism probably needs to be re-thought.

answered Jan 25 '12 at 18:03

ccoakley

3,235
15
12

Hmm, that's an interesting idea! And yes, I only want them to learn how to eat, this is just a small project. But wouldn't that be... "Cheating"? I understand the general idea, but simply moving towards food is, well, the whole goal of this simulation! I'd like them to achieve that in a more "indirect" way, so to speak. – corazza Jan 25 '12 at 18:05
@bane: That's why I prefaced it with "How do you want it to learn?" Your own constraints matter quite a bit. But if you model learning as taking actions and getting feedback, then you have a limited number of things you can change. You can have your organisms "think ahead" by branching on virtual decisions (ordering the actions instead of just executing the highest weight) and backtracking, but that's roughly equivalent to just creating a bunch of organisms and allowing the unlucky losers to die. – ccoakley Jan 25 '12 at 18:16
I'd like them to learn by themselves as much as possible, with little or no "guidance". That's why I rolled with randomness. But I really like the idea of "thinking ahead". – corazza Jan 25 '12 at 21:45

score 4 · Answer 4 · answered Jan 25 '12 at 18:13

There are several algorithms that can be used to optimize the weights in a neural network, the most common of which is the backpropogation algorithm.

From reading your question I gather that you are trying to build neural network bots that will search for food. The way to achieve this with backpropogation would be to have an initial learning period, where the weights are initially randomly set (as you're doing) and gradually refined using the backpropogation algorithm until they reach a performance level you're happy with. At that point you can stop them learning and allow them to frolic freely in flatland.

However I think there might be a few issues with your network design. Firstly, if there is only 1 eye active at any time, it would make more sense to have just one input node and keep track of orientation some other way (if I'm understanding that correctly). Simply, if there is only one active eye and four possible actions (forward, back, left, right) then the inputs from the inactive eyes (presumably zero) would have no bearing on the output decision, in fact I suspect the weights for each input to all outputs would converge, essentially duplicating the same function. Moreover, it needlessly increases the complexity of the network and increases the learning time. Secondly, you don't need that many output neurons to represent all possible actions. As you have it described there, your output would be {1,0,0,0} = right, {0,1,0,0} = left and so on. Depending on the type of neuron modeled, this can be done with 2 or even 1 output neuron. If using a binary neuron (each output is either 1 or 0), then do something like {0,0} = back, {1,1} = forward, {1,0} = left, {0,1} = right. Using a sigmoidal function neuron (the output can be a real number from 0..1), you could do {0} = back, {0.33} = left, {0.66} = right, {1} = forward.

2^2 = 4. So two bits would represent four states, and I need five (one of them is standing still). I really think that this network is simple, I know there are ways to make it even simpler or efficient, but at this point that simply isn't necessary. I heard about backpropagation, I studied it a bit before, but I concluded that I won't really need it on a project as simple as this one. Also, I'd like them to learn in real time, on the flatland, not train them before releasing them... That's the general idea, at least. — corazza, Jan 25 '12 at 21:49
Ah, I hadn't figured on a standing still action, although the sigmoid output neuron point still stands. — Nicholas McCarthy, Jan 26 '12 at 15:40
Learning in real-time is fine, but at some point they'll reach a stage where they have reached an optimal weighting scheme and any further changes would degrade its performance. Perhaps you could experiment with encoding the set of weights in a genetic algorithm? Allow bots who haven't 'eaten' in x days to die off, and bots who have eaten y amount can spawn copies of themselves with a slight chance of mutation. — Nicholas McCarthy, Jan 26 '12 at 15:50
Well that's what I'm doing. Also, at first I do change the weights randomly, but with each food block eaten, the chance for a random change decreases! This is passed on their offsprings. S basically, the ones that eat the most will change the least! — corazza, Jan 26 '12 at 16:33

score 3 · Answer 5 · answered Jan 25 '12 at 19:56

I can see a bunch of potential problems.

First and foremost, I'm not clear about the algorithm that updates your weights. I like the 1% decrease as a concept-- it looks like you're trying to discount distant memories, which is good in principle-- but the rest probably isn't sufficient. You need to look at some of the standard update algorithms like backpropagation, but that's just a start, because....

...You're only giving your network credit for the last stage of eating the food. It doesn't seem like there's any direct mechanism for getting your network incrementally closer to the food, or to clumps of food. Even taking the directionality of the eyes at face value, your eyes are very simple, and there's not much long term memory.

Also, if your network diagram is accurate, it's probably not sufficient. You really want to have a hidden layer (at least one) between the sensors and the actuators, if you use something related to backpropagation. There are detailed mathematics behind that statement, but it boils down to, "The hidden layers will allow good solutions of more problems."

Now, notice that a lot of my comments are talking about the architecture of the network, but only in general terms without saying concretely, "This will work," or "that will work." That's because I don't know either (although I think Kwatford's suggestion of reinforcement learning is a very good one.) Sometimes, you can evolve the network parameters as well as the network instances. One such technique is Neuroevolution of Augmenting Topologies, or "NEAT". Might be worth a look.

I know what backpropagation is, and I've worked with it a bit, **but**, I really believe that my network design **is** sufficient. Because the world around them is really simple! There's no need to complicate, I think it's obvious that this problem can be solved with a simple network design. — corazza, Jan 25 '12 at 21:28
Exclusive-Or is a simple problem, too, but it provably requires more than one layer to solve. As the number of variables increases, the fraction of linearly separable functions in the solution space drops very rapidly. Even when you don't mathematically need an extra layer, having one often speeds convergence. — Novak, Jan 25 '12 at 21:39

Kiril · Answer 6 · 2012-01-25T18:26:36.867

2

I think a more complex example of what you're doing is presented by Polyworld.

You can also see the Google Tech Talks presentation from 2007: http://www.youtube.com/watch?v=_m97_kL4ox0

However, the fundamental idea is to take an evolutionary approach within your system: use small random mutations combined with genetic cross-over (as the main form of diversification) and select individuals which are "better" suited to survive the environment.

edited Jan 25 '12 at 18:26

answered Jan 25 '12 at 18:21

Kiril

39,672
31
167
226

Yes, I watched that a long time ago. I'm not really aiming at evolution, though, but machine learning. There is, of course, some evolution, but it shouldn't play the crucial role. – corazza Jan 25 '12 at 21:23
3

Evolutionary/genetic algorithms are part of machine learning... if you're trying to avoid getting stuck on a local minimum/maximum, then I think that evo/gen approaches are pretty good for that purpose. – Kiril Jan 25 '12 at 21:47

How to make virtual organisms learn using neural networks?

6 Answers6