Neuralnetwork activation function

Question

This is beginner level question. I have several training inputs in binary and for the neural network I am using a sigmoid thresholding function SigmoidFn(Input1*Weights) where

SigmoidFn(x) =  1./(1+exp(-1.*x));

The use of the above function will give continuous real numbers. But, I want the output to be in binary since the network is a Hopfield neural net (single layer 5 input nodes and 5 output nodes). The problem which I am facing is I am unable to correctly understand the usage and implementation of the various thresholding fucntions. The weights given below are the true weights and provided in the paper. So, I am using the weights to generate several training examples, several output samples by keeping the weight fixed, that is just run the neural network several times.

Weights = [0.0  0.5  0.0  0.2  0.0
           0.0  0.0  1.0  0.0  0.0
           0.0  0.0  0.0  1.0  0.0
           0.0  1.0  0.0  0.0  0.0
           0.0  0.0  0.0 -0.6  0.0];


Input1 = [0,1,0,0,0]

x = Input1*Weights;   % x = 0 0 1 0 0

As can be seen the result of the multiplication is the second row of the Weights. Is this a mere coincidence?

Next,

SigmoidFn  =  1./(1+exp(-1.*x))

SigmoidFn =

0.5000    0.5000    0.7311    0.5000    0.5000

round(SigmoidFn)

ans =

     1     1     1     1     1

Input2 = [1,0,0,0,0]

x = Input2*Weights

x =  0  0.5000  0  0.2000  0
SigmoidFn  =  1./(1+exp(-1.*x))

SigmoidFn =  0.5000    0.6225    0.5000    0.5498    0.5000

>> round(SigmoidFn)

ans =

      1     1     1     1     1

Is it a good practice to use the round function round(SigmoidFn(x)) . ? The result obtained is not correct. or how should I obtain binary result when I use any threshold function: (a) HArd Limit (b) Logistic sigmoid (c) Tanh

Can somebody please show the proper code for thresholding and a brief explanation of when to use which activation function?I mean there should be certain logic otherwise why are there different kinds of functions? EDIT : Implementation of Hopfield to recall the input pattern by successive iterations by keeping the weight fixed.

Training1 = [1,0,0,0,0];
offset = 0;
t = 1;
X(t,:) = Training1;
 err = 1; 
 while(err~=0)
  Out = X(t,:)*Weights > offset;
  err = ((Out - temp)*(Out - temp).')/numel(temp);
t = t+1
 X(t,:) = temp;
 end

On question 1: No this is definitely not a coincidence. That is just how [matrix multiplication](http://www.c-sharpcorner.com/UploadFile/941fc8/matrix-multiplication-in-java/Images/MatrixMultiplication-3.png) works. — Dennis Jaheruddin, Apr 08 '14 at 09:25

Isaac · Accepted Answer · 2014-04-07T22:28:59.130

2

Hopfield networks do not use a sigmoid nonlinearity; the state of a node is simply updated to whether its weighted input is greater than or equal to its offset.

You want something like

output2 = Weights * Input1' >= offsets;

where offsets is the same size as Input1. I used Weights * Input1' instead of Input1 * Weights because most examples I have seen use left-multiplication for updating (that is, the rows of the weight matrix label the input nodes and the columns label the output nodes), but you will have to look at wherever you got your weight matrix to be sure.

You should be aware that you will have to perform this update operation many times before you converge to a fixed point which represents a stored pattern.

In response to your further questions, the weight matrix you have chosen does not store any memories that can be recalled with a Hopfield network. It contains a cycle 2 -> 3 -> 4 -> 2 ... that will not allow the network to converge.

In general you would recover a memory in a way similar to what you wrote in your edit:

X = [1,0,0,0,0];
offset = 0;
t = 1;
err = 1;
nIter = 100;

while err ~= 0 && t <= nIter
   prev = X;
   X = X * Weights >= offset;
   err = ~isequal(X, prev);
   t = t + 1;
end

if ~err
    disp(X);
end

If you refer to the wikipedia page, this is what's referred to as the synchronous update method.

edited Apr 07 '14 at 22:28

answered Apr 07 '14 at 17:00

Isaac

3,586
1
18
20

Thank you for your reply. I didn't quite understand the offset. If the input is say p_1 = [1 -1 -1 -1 -1 1] and output2= Weights*p_1 then should I be doing ind1 = output2 <= 1; output2 = output2> 1; output2(ind1) = -1; output2(ind2) = 1; – SKM Apr 07 '14 at 17:56
I'm not sure I understand what you're asking... Every node in a Hopfield network has an associated offset, which is how high its input has to get before it will turn on. If you don't know the offsets you won't be able to get the right memories out of your weight matrix, but maybe you can assume that all the offsets are zero. In this case the code you want is just `output2 = (Weights * p_1 >= 0);`. – Isaac Apr 07 '14 at 18:55
The paper from which I have taken the above example is http://www.researchgate.net/publication/234769289_Goal-oriented_decision_support_based_on_fuzzy_cognitive_map (Goal oriented decision support based on fuzzy cognitive map) although I have put a different weight matrix which is different to the one in pg2. Fuzzy cognitive map is similar to Hopfield, the difference being that other activation functions can be used. So, I am unable to simulate the result as depicted in the example. That's why I asked the question how the Authors used the logistic funtion – SKM Apr 07 '14 at 19:54
That paper is full of errors, so I find it more likely that their example is wrong than that they meant to say that they used a sigmoid function `f` such that `f([0 0 -1 0 1]) = [1 0 0 0 1]`, which they seem to claim in their example... Besides that, the model they are using is not restricted to binary values, so you should not be concerned when you get non-binary output. There is no sensible way to use a sigmoid to produce binary output; you would just use a step-function. – Isaac Apr 07 '14 at 20:56
Ah I see, so in order to produce binary output one should directly use step function. So, from your answer, how many times should the iterations continue so as to recall the input from the stored pattern? Should it be a single iteration? Could you kindly edit your reply to include how the stored pattern is recalled? I have put up a code to show the implementation of Hopfield. But it is not converging since 100% what I did is incorrect. Thank you lastly for going through that paper and grateful for your effort and time. – SKM Apr 07 '14 at 21:16
One last Question for clarification to make sure if I understood correctly >The recall operation is done after the network model has been trained and weights are obtained. Keeping the weight fixed, we can repeat the while loop till an acceptable error threshold has been reached. But, in this process of recall, there is no weight update. Am I correct? – SKM Apr 08 '14 at 02:20
Correct; a hopfield network is normally trained offline. That is, the weight matrix is constructed initially from the patterns you want to remember. The simplest way to construct these matrices is [hebbian learning](http://en.wikipedia.org/wiki/Hopfield_network#Training). – Isaac Apr 08 '14 at 03:58
Apologies to again bother. I noticed that the activation function is not used in recall but during training I used the signum activation function. I am unable to recall all the patterns when I presented corrupted versions of all the patters. So, must I train the Hopfield network with corrupted and uncorrupted patterns? Will be extremely grateful for your suggestions. – SKM Apr 10 '14 at 18:32
What method are you using to train? Generally your ability to recover corrupted patterns will be a function of the size of your patterns and the number of patterns; fewer, larger patterns will be easier to recover once corrupted. Training on corrupted patterns doesn't make sense for Hopfield networks. – Isaac Apr 10 '14 at 19:00
When I used offset = 0, I got a white image. I tried recalling one pattern at a time. Each pattern is represented by [-1,1] only and the Characters0,1,2 are the patterns each of length 30 bits. I am using Particle swarm to train the network. – SKM Apr 10 '14 at 19:13
Particle swarm to optimize what? Have you tried just using the hebbian formula to calculate the weights? – Isaac Apr 10 '14 at 20:20
I used Particle Swarm to optimize the MSE and the particles/weights for which the MSE fitness is minimum is selected. I have tried using pseudo-inverse and Hebbs rule as well. My objective is to compare Hebbs with PSO. I either case I did not use any offset, I used the signum thresholding function as per the general theory of NN which requires to threshold the output. That's why I am a bit confused since I am unable to work with PSO, – SKM Apr 10 '14 at 21:43
MSE of what though? So you first create a random matrix, and then you attempt to recover patterns for it, and then you try to explore the matrices that recover patterns closest to your targets? – Isaac Apr 11 '14 at 19:27
MSE between model output and target. And the rest is correct what you said. – SKM Apr 11 '14 at 23:57
So if you're going to use PSO with this problem, you must take into account that most of the matrices you spawn will not converge. You can try to constrain the matrices you generate, for instance by requiring them to have only real eigenvectors, which will result in more of them converging, but I'm not sure off the top of my head how to do this generatively without just falling back on the analytical matrix constructions like Hebb's rule. – Isaac Apr 12 '14 at 22:58

Neuralnetwork activation function

1 Answers1

Linked