2

I'm currently building a Machine - Learning Library.

The gist of the problem: when I add too many (and too big) Hidden Layers to the NN, it outputs NAN or -NAN. Why does that happen? Can I make that not happen? If I cannot, what can/should I do so the Library handles it professionally?

One of the library's (full scale) unit test looks like this:

struct testTrainWithMultipleLayers : public ZNN::NeuralNetwork<double>
{
    ZNN::NeuralNetwork<double> b;
    std::vector<std::vector<double>> input{{1, 0}, {0, 1}, {0, 0}, {1, 1}};
    std::vector<std::vector<double>> target{{1}, {1}, {0}, {0}};
    testTrainWithMultipleLayers()
    {
        withNeuralNetwork();
        trainForNIterations(1000);
        requireCorrectOutputs();
    }

    void withNeuralNetwork()
    {
        b.setInputLayerSize(2);
        b.addHiddenLayer(50);
        b.setOutputLayerSize(1);
        b.setLearningRate(0.7);
        b.setNormalization(ZNN::Fermi<double>());
    }

    void trainForNIterations(size_t iterations)
    {
         /////train the neural networks for so and so many iterations
    }

    void requireCorrectOutputs()
    {
        ///check the expected values were correctly approximated
    }
};

The test Passes well! But when I change withNeuralNetwork to this:

    void withNeuralNetwork()
    {
        b.setInputLayerSize(2);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.addHiddenLayer(50);
        b.setOutputLayerSize(1);
        b.setLearningRate(0.7);
        b.setNormalization(ZNN::Fermi<double>());
    }

The output of my Unit Test looks like this:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test is a Catch v2.3.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
Neural Network
-------------------------------------------------------------------------------
test_NN.hpp:9
...............................................................................

test_NN.hpp:263: FAILED:
  CHECK( Disabled by CATCH_CONFIG_DISABLE_STRINGIFICATION )
with expansion:
  Approx( nan ) == 2.0

test_NN.hpp:264: FAILED:
  CHECK( Disabled by CATCH_CONFIG_DISABLE_STRINGIFICATION )
with expansion:
  Approx( nan ) == 2.0

test_NN.hpp:265: FAILED:
  CHECK( Disabled by CATCH_CONFIG_DISABLE_STRINGIFICATION )
with expansion:
  Approx( nan ) == 1.0

test_NN.hpp:266: FAILED:
  CHECK( Disabled by CATCH_CONFIG_DISABLE_STRINGIFICATION )
with expansion:
  Approx( nan ) == 1.0

===============================================================================
test cases:  5 |  4 passed | 1 failed
assertions: 50 | 46 passed | 4 failed

The NN outputs a NAN or -NAN (depending on the setup)


It might be worth noting, that this not only happens with my specific unit test, but rather in every NN application i've created using this library (which until now hasn't been too big of a deal as I was able to shrink the NN in size until it works).


I've tracked down where this first happens: it happens in the function that changes Neuron's weights, the algorithm gets a derivative that is nan thus making the entire NN cascade into outputting NAN.

I have made sure that no division by 0 is made. The only division in the entire NN's code is the 1/2 * error used to calculate the NN's error and the division in the normalization but i've made sure that there is no division by 0 there either. Or rather: when solving for exp (x) = 0 , x is undefined


The unit test's code is here: https://github.com/Wittmaxi/ZENeural/blob/master/library/tests/test_NN.cpp#L220


Edit

This also happens when I change withNeuralNetwork to this:

    void withNeuralNetwork()
    {
        b.setInputLayerSize(2);
        b.addHiddenLayer(5000);
        b.setOutputLayerSize(1);
        b.setLearningRate(0.7);
        b.setNormalization(ZNN::Fermi<double>());
    }
Maximilian
  • 33
  • 3
  • Have a read of: https://stackoverflow.com/questions/5393997/stopping-the-debugger-when-a-nan-floating-point-number-is-produced And https://stackoverflow.com/questions/3615724/how-to-trace-a-nan-in-c – Richard Critten Aug 25 '18 at 17:58
  • @RichardCritten I enabled that. The issue is in "calculateDerivatives" which is similiar to what I had already tracked down beforehand. (thanks for showing me about that way of catching nans. will come handy again in the future ;)) – Maximilian Aug 25 '18 at 19:35

0 Answers0