I compared the r2 score according to the number of hidden layers and hidden units using the for-loop and selected the layers and units with a high score and acceptable convergence time.
However, the re-calculation with the selected layers and units yields different r2 scores.
Even fixing the number of layers and the number of units and just running the loop result in different r2 score as shown below. [same number of layers and units, but different results][1]
I have been thinking two possible reasons: firstly, for loop, the session is not initialized and secondly, the reproducibility in NN was not guaranteed.
I searched for other articles to solve both, but I'm asking because I still couldn't find the answer. Thank you in advance for your help.
The main code is below. To eliminate the randomness during the data split, the scikit-learn library with the same random_state was implemented.
n_layer = i+1
x_con[i] = n_layer
for j in range(m_neuron):
n_neuron = 2**(j+1)
y_con[j] = n_neuron
print('n_layer: ',n_layer,'n_neuron:',n_neuron)
# Launch the graph in a session.
sess = tf.Session()
tf.set_random_seed(777) # for reproducibility
# Create model and solver
m1 = FCNN(str(i)+str(j), n_feature, n_output, n_layer, n_neuron, learning_rate, use_batchnorm=True)
m1_solver = Solver(sess, m1)
# Initializes global variables in the graph
init = tf.global_variables_initializer()
sess.run(init)
cost_val_old = np.full((n_output), 0.)
for step in range(n_epoch):
cost_val, y_train_predict, _ = m1_solver.train(x_train_scaled, y_train_scaled)
diff_tmp = m1_solver.convergence_criterion(cost_val,cost_val_old)
cost_val_old = cost_val
if (step % n_print == 0 and step > 0) or diff_tmp <= tol:
print("{0} Cost: {1} Diff: {2:.10f}".format(step,cost_val,diff_tmp))
if diff_tmp <= tol:
cost_train[j,i,:] = cost_val[0:3]
iter_train[j,i] = step
break
y_valid_predict = np.squeeze(np.array(m1_solver.predict(x_valid_scaled)), axis=0)
y_test_predict = np.squeeze(np.array(m1_solver.predict(x_test_scaled)), axis=0)
# Evaluate r2 score
for k in range(n_output):
r2_train_tmp = m1_solver.evaluate_r2(y_train_scaled[:,i], y_train_predict[:,i])
r2_valid_tmp = m1_solver.evaluate_r2(y_valid_scaled[:,i], y_valid_predict[:,i])
r2_test_tmp = m1_solver.evaluate_r2(y_test_scaled[:,i], y_test_predict[:,i])
r2_train[j,i,k] = r2_train_tmp[0]
r2_valid[j,i,k] = r2_valid_tmp[0]
r2_test [j,i,k] = r2_test_tmp[0]
# Close session
sess.close()
The class of the model is also below. This class is mainly based on https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-10-6-mnist_nn_batchnorm.ipynb.
def __init__(self, name, n_feature, n_output, n_layer, n_neuron, lr, use_batchnorm=True):
with tf.variable_scope(name):
self.x = tf.placeholder(tf.float32, shape=[None, n_feature], name='x')
self.y = tf.placeholder(tf.float32, shape=[None, n_output], name='y')
self.mode = tf.placeholder(tf.bool, name='train_mode')
self.y_target = tf.placeholder(tf.float32, shape=[None])
self.y_prediction = tf.placeholder(tf.float32, shape=[None])
self.cost_new = tf.placeholder(tf.float32, shape=[n_output])
self.cost_old = tf.placeholder(tf.float32, shape=[n_output])
# Loop over hidden layers
net = self.x
hidden_dims = np.full((n_layer), n_neuron)
for i, h_dim in enumerate(hidden_dims):
with tf.variable_scope('layer{}'.format(i)):
net = tf.layers.dense(net, h_dim)
if use_batchnorm:
net = tf.layers.batch_normalization(net, training=self.mode)
net = tf.nn.relu(net)
# Attach fully connected layers
net = tf.contrib.layers.flatten(net)
self.hypothesis = tf.layers.dense(net, n_output)
self.cost = tf.reduce_mean(tf.square(self.hypothesis - self.y),axis=0, name='cost')
# When using the batchnormalization layers,
# it is necessary to manually add the update operations
# because the moving averages are not included in the graph
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS, scope=name)
with tf.control_dependencies(update_ops):
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
self.train_op = optimizer.minimize(self.cost)
# convergence criterion
self.diff = tf.sqrt(tf.reduce_sum(tf.square(self.cost_new - self.cost_old)))
# R2 score
total_error = tf.reduce_sum(tf.square(self.y_target - tf.reduce_mean(self.y_target)))
unexplained_error = tf.reduce_sum(tf.square(self.y_target - self.y_prediction))
self.acc_R2 = 1. - unexplained_error/total_error ```
[1]: https://i.stack.imgur.com/Fo42X.png