5

I am triying to configure an RNN neural netwwork in order to predict 5 different types of text entities. I am using the next configuration:

    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .seed(seed)
            .iterations(100)
            .updater(Updater.ADAM)  //To configure: .updater(Adam.builder().beta1(0.9).beta2(0.999).build())
            .regularization(true).l2(1e-5)
            .weightInit(WeightInit.XAVIER)
            .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue).gradientNormalizationThreshold(1.0)
            .learningRate(2e-2)
            .trainingWorkspaceMode(WorkspaceMode.SEPARATE).inferenceWorkspaceMode(WorkspaceMode.SEPARATE)   //https://deeplearning4j.org/workspaces
            .list()
            .layer(0, new GravesLSTM.Builder().nIn(500).nOut(3)
                    .activation(Activation.TANH).build())
            .layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT).activation(Activation.SOFTMAX)        //MCXENT + softmax for classification
                    .nIn(3).nOut(5).build())
            .pretrain(false).backprop(true).build();
  MultiLayerNetwork net = new MultiLayerNetwork(conf);
  net.init();

I train it and then I evaluate it. It works. Nevertheless when I use:

 int[] prediction = net.predict(features);

Sometimes it retuns and unexpected predictions. It returns correct predictions as 1,2....5 but sometimes it returns numbers as 9,14,12... This numbers not corresponds to an recognised prediction/label.

Why this configuration return unexpected outputs?

Martin
  • 1,282
  • 1
  • 15
  • 43
  • 1
    There is example https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j/dl4j-examples/tutorials/08.%20RNNs-%20Sequence%20Classification%20of%20Synthetic%20Control%20Data.zepp.ipynb – egorlitvinenko Jun 21 '18 at 15:46
  • 1
    Could you share the code with initialization of features? – egorlitvinenko Jun 21 '18 at 15:47
  • 1
    I use the official word2vecsentiment example. The only change is the number of possible outputs. – Martin Jun 25 '18 at 09:12
  • I use this example: https://github.com/deeplearning4j/dl4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/word2vecsentiment changing the inputs and added posible some outputs. – Martin Jun 27 '18 at 14:48

1 Answers1

2

Don't use net.predict. Use net.output with Nd4j.argMax(outputOfNeuralNet,-1); Net.predict should not be used (it was mainly used with 2d).

Adam Gibson
  • 3,055
  • 1
  • 10
  • 12
  • Please add one example of your solution. – Martin Jun 27 '18 at 16:15
  • This two functions are not analogous. The Net.predict output is one INDArray and the net.predict output is one int array with the predicted classes. Can you give some example of how to use it? – Martin Jun 27 '18 at 16:23
  • 1
    The Nd4j.argMax outputs the indices for you. You can use an INDArray like you would any int array. The dl4j examples already cover this in a few places. One example being: https://github.com/deeplearning4j/dl4j-examples/blob/ce193994fec385cbe814a7ab9014d04f53c21b13/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/anomalydetection/MNISTAnomalyExample.java#L80 - change 1 to -1 though. -1 means "run on the last dimension no matter what it is". This follows numpy conventions. And to correct you: yes the 2 functions are analogous. My answer is just a more general version – Adam Gibson Jun 27 '18 at 16:25