How to decide activation function in neural network

Question

I am using feedforward, backpropagation, multilayer neural network and I am using sigmoid function as a activation function which is having range of -1 to 1. But the minimum error is not going below 5.8 and I want so less, you can see the output after 100000 iterations. graph of error against iterations in NN

I think this is because of my output range is above 1, and sigmoid functions range is only -1 to 1. Can anybody suggest me how i can overcome this problem as my desired output range is 0 to 2.5. Suggest me which activation function will be best for this range.

score 2 · Answer 1 · answered Mar 06 '14 at 16:44

2

The vanilla sigmoid function is:

def sigmoid(x):
    return 1/(1+math.e**-x)

You could transform that to:

def mySigmoid(x):
    return 2.5/(1+math.e**-x)

in order to make the transformation that you want

answered Mar 06 '14 at 16:44

inspectorG4dget

110,290
27
149
241

I tried your suggested option but it is giving error as: OverflowError: (34, 'Numerical result out of range') – lkkkk Mar 06 '14 at 17:02
is this way that I can set the desired value in between -1 to 1 by dividing some number and after getting result I can again multiply that result again by same number using sigmoid function only. – lkkkk Mar 06 '14 at 17:19
@Latik: You would likely be unable to apply a linear transform to the [-1, 1] output, as that would take away from the "sigmoidness" of the activation function. Please wrap the call to `mySigmoid` in a try/catch and print the value of `x` on exception. I wonder what value of `x` is causing this error – inspectorG4dget Mar 06 '14 at 17:22

score 2 · Accepted Answer · edited May 23 '17 at 11:59

If you are seeking to reduce output error, there are a couple of things to look at before tweaking a node's activation function.

First, do you have a bias node? Bias nodes have several implications, but - most relevant to this discussion - they allow the network output to be translated to the desired output range. As this reference states:

The use of biases in a neural network increases the capacity of the network to solve problems by allowing the hyperplanes that separate individual classes to be offset for superior positioning.

This post provides a very good discussion: Role of Bias in Neural Networks. This one is good, too: Why the BIAS is necessary in ANN? Should we have separate BIAS for each layer?

Second method: it often helps to normalize your inputs and outputs. As you note, your sigmoid offers a range of +/- 1. This small range can be problematic when trying to learn functions that have a range of 0 to 1000 (for example). To aid learning, it's common to scale and translate inputs to accommodate the node activation functions. In this example, one might divide the range by 500, yielding a 0 to 2 range, and then subtract 1 from this range. In this manner, the inputs have been normalized to a range of -1 to 1, which better fits the activation function. Note that network output should be denormalized: first, add +1 to the output, then multiply by 500.

In your case, you might consider scaling the inputs by 0.8, then subtracting 1 from the result. You would then add 1 to the network output, and then multiply by 1.25 to recover the desired range. Note that this method may be easiest to accomplish since it does not directly change your network topology like the addition of bias would.

Finally, have you experimented with changing the number of hidden nodes? Although I believe the first two options are better candidates for improving performance, you might give this one a try. (Just as a point of reference, I can't recall an instance in which modifying the activation function's shape improved network response more than option 1 and 2.)

Here are some good discussion of hidden layer/node configuration: multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer? How to choose number of hidden layers and nodes in neural network?

24 inputs make your problem a high-dimensional one. Ensure that your training dataset adequately covers the input state space, and ensure that you are your test data and training data are drawn from similarly representative populations. (Take a look at the "cross-validation" discussions when training neural networks).

actually my desired output range is 0 to 5000, and I tried by dividing this range by 5000 and after getting result I multiplied by 5000 again, here error is corrected. But still I will try by your suggestion because while testing the NN I am getting correct output for only given input combination while training and for other inputs it is not giving correct output. Here I am using one bias node as my inputs number is 24 (nodes : 24+1), and in hidden layer I am using the general rule i.e. (#inputs + #outputs)*(2/3). — lkkkk, Mar 07 '14 at 08:33
please suggest if there is any rule to initially set the random weights, I have number of input:24(of binary 0 & 1 ) + 1 bias, hidden nodes:18, outputs:1(range: 0 to 5000).And what should be the exact curve of error. — lkkkk, Mar 07 '14 at 09:00
Use random initial weights. Take a look at this link: http://stackoverflow.com/questions/20027598/why-should-weights-of-neural-networks-be-initialized-to-random-numbers — Throwback1986, Mar 07 '14 at 14:43
I did for the range 0 to 5000 by directly dividing it first by 5000 and after getting result multiplying by 5000. My error are minimized but while training the result is coming for only given inputs while training it is not giving required output for in between inputs. Please suggest... — lkkkk, Mar 10 '14 at 05:34
Have you tried scaling your input to a range of -1 to +1 as suggested above? How are you implementing your network? MATLAB or some other tool? Or writing your own code? Activation functions are fairly trivial to implement - as long as you've implemented it properly, your problem likely resides elsewhere. — Throwback1986, Mar 11 '14 at 13:48
I implemented my own code in python, i tried by scaling my input in the range -1 to 1 but it was giving large error then i ranged it in 0 to 1 (by dividing 5000) now it is giving better but taking iterations around 5000000 which is too large and time consuming to simulate.here I had taken learning rate as 0.003 and momentum rate as 0.0001. — lkkkk, Mar 12 '14 at 05:01
Perhaps you have a problem in your implementation. I suggest posting your code in another question. I believe the scope of this one has been covered. — Throwback1986, Mar 12 '14 at 13:59
please see my posted code regarding above from link: http://stackoverflow.com/questions/22355722/setting-parameters-in-neural-network-in-python — lkkkk, Mar 12 '14 at 15:11

How to decide activation function in neural network

2 Answers2