0

I compared the r2 score according to the number of hidden layers and hidden units using the for-loop and selected the layers and units with a high score and acceptable convergence time.

However, the re-calculation with the selected layers and units yields different r2 scores.

Even fixing the number of layers and the number of units and just running the loop result in different r2 score as shown below. [same number of layers and units, but different results][1]

I have been thinking two possible reasons: firstly, for loop, the session is not initialized and secondly, the reproducibility in NN was not guaranteed.

I searched for other articles to solve both, but I'm asking because I still couldn't find the answer. Thank you in advance for your help.

The main code is below. To eliminate the randomness during the data split, the scikit-learn library with the same random_state was implemented.

    n_layer = i+1
    x_con[i] = n_layer
    for j in range(m_neuron):
        n_neuron = 2**(j+1)
        y_con[j] = n_neuron
        print('n_layer: ',n_layer,'n_neuron:',n_neuron)

        # Launch the graph in a session.
        sess = tf.Session()
        tf.set_random_seed(777)  # for reproducibility

        # Create model and solver
        m1   = FCNN(str(i)+str(j), n_feature, n_output, n_layer, n_neuron, learning_rate, use_batchnorm=True)
        m1_solver = Solver(sess, m1)

        # Initializes global variables in the graph
        init = tf.global_variables_initializer()
        sess.run(init)

        cost_val_old = np.full((n_output), 0.)
        for step in range(n_epoch):
            cost_val, y_train_predict, _ = m1_solver.train(x_train_scaled, y_train_scaled)
            diff_tmp = m1_solver.convergence_criterion(cost_val,cost_val_old)
            cost_val_old = cost_val

            if (step % n_print == 0 and step > 0) or diff_tmp <= tol:
                print("{0} Cost: {1} Diff: {2:.10f}".format(step,cost_val,diff_tmp))
                if diff_tmp <= tol:
                    cost_train[j,i,:] = cost_val[0:3]
                    iter_train[j,i]   = step
                    break

        y_valid_predict = np.squeeze(np.array(m1_solver.predict(x_valid_scaled)), axis=0)
        y_test_predict  = np.squeeze(np.array(m1_solver.predict(x_test_scaled)), axis=0)

        # Evaluate r2 score
        for k in range(n_output):
            r2_train_tmp = m1_solver.evaluate_r2(y_train_scaled[:,i], y_train_predict[:,i])
            r2_valid_tmp = m1_solver.evaluate_r2(y_valid_scaled[:,i], y_valid_predict[:,i])
            r2_test_tmp  = m1_solver.evaluate_r2(y_test_scaled[:,i], y_test_predict[:,i])
            r2_train[j,i,k] = r2_train_tmp[0]
            r2_valid[j,i,k] = r2_valid_tmp[0]
            r2_test [j,i,k] = r2_test_tmp[0]

        # Close session
        sess.close()

The class of the model is also below. This class is mainly based on https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-10-6-mnist_nn_batchnorm.ipynb.

    def __init__(self, name, n_feature, n_output, n_layer, n_neuron, lr, use_batchnorm=True):
        with tf.variable_scope(name):
            self.x = tf.placeholder(tf.float32, shape=[None, n_feature], name='x')
            self.y = tf.placeholder(tf.float32, shape=[None, n_output], name='y')
            self.mode = tf.placeholder(tf.bool, name='train_mode')
            self.y_target     = tf.placeholder(tf.float32, shape=[None])
            self.y_prediction = tf.placeholder(tf.float32, shape=[None])
            self.cost_new     = tf.placeholder(tf.float32, shape=[n_output])
            self.cost_old     = tf.placeholder(tf.float32, shape=[n_output])

            # Loop over hidden layers
            net = self.x
            hidden_dims = np.full((n_layer), n_neuron)
            for i, h_dim in enumerate(hidden_dims):
                with tf.variable_scope('layer{}'.format(i)):
                    net = tf.layers.dense(net, h_dim)

                    if use_batchnorm:
                        net = tf.layers.batch_normalization(net, training=self.mode)

                    net = tf.nn.relu(net)

            # Attach fully connected layers
            net = tf.contrib.layers.flatten(net)
            self.hypothesis = tf.layers.dense(net, n_output)

            self.cost = tf.reduce_mean(tf.square(self.hypothesis - self.y),axis=0, name='cost')

            # When using the batchnormalization layers,
            # it is necessary to manually add the update operations
            # because the moving averages are not included in the graph
            update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, scope=name)
            with tf.control_dependencies(update_ops):
                optimizer = tf.train.AdamOptimizer(learning_rate=lr)
                self.train_op = optimizer.minimize(self.cost)

            # convergence criterion
            self.diff = tf.sqrt(tf.reduce_sum(tf.square(self.cost_new - self.cost_old)))

            # R2 score
            total_error       = tf.reduce_sum(tf.square(self.y_target - tf.reduce_mean(self.y_target)))
            unexplained_error = tf.reduce_sum(tf.square(self.y_target - self.y_prediction))
            self.acc_R2       = 1. - unexplained_error/total_error ``` 


  [1]: https://i.stack.imgur.com/Fo42X.png
  • [This](https://ai.stackexchange.com/questions/17412/how-to-reproduce-neural-network-training-with-keras/17420#17420) may be of help – OverLordGoldDragon Jan 09 '20 at 14:59
  • [This](https://stackoverflow.com/questions/50659482/why-cant-i-get-reproducible-results-in-keras-even-though-i-set-the-random-seeds/52897289#comment103594717_52897289) might help too. – Siddhant Tandon Jan 09 '20 at 15:52

0 Answers0