How does TensorFlow know which variables to change for optimization?

Question

Code taken from:-http://adventuresinmachinelearning.com/python-tensorflow-tutorial/

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
# Python optimisation variables
learning_rate = 0.5
epochs = 10
batch_size = 100

# declare the training data placeholders
# input x - for 28 x 28 pixels = 784
x = tf.placeholder(tf.float32, [None, 784])
# now declare the output data placeholder - 10 digits
y = tf.placeholder(tf.float32, [None, 10])
# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')
# calculate the output of the hidden layer
hidden_out = tf.add(tf.matmul(x, W1), b1)
hidden_out = tf.nn.relu(hidden_out)
# now calculate the hidden layer output - in this case, let's use a softmax activated
# output layer
y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped)
                         + (1 - y) * tf.log(1 - y_clipped), axis=1))
# add an optimiser
optimiser = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)
# finally setup the initialisation operator
init_op = tf.global_variables_initializer()

# define an accuracy assessment operation
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# start the session
with tf.Session() as sess:
   # initialise the variables
   sess.run(init_op)
   total_batch = int(len(mnist.train.labels) / batch_size)
   for epoch in range(epochs):
        avg_cost = 0
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
            _, c = sess.run([optimiser, cross_entropy], 
                         feed_dict={x: batch_x, y: batch_y})
            avg_cost += c / total_batch
        print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost))
   print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))

I wanted to ask, how does tensorflow recognize the parameters it needs to optimize , like in the above code we need to optimize w1,w2,b1 & b2 but we never specified that anywhere. We did ask GradientDescentOptimizer to minimize cross_entropy but we never told it that it would have to change the values of w1,w2,b1&b2 in order to do so , So how did it know the parameters on which cross_entropy depended upon?

score 3 · Accepted Answer · answered Aug 09 '18 at 06:05

The answer by Cory Nezin is only partially correct, and could lead to wrong assumptions!

You actually do specify which parameters are optimized (=trainable), namely by doing this:

# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')

In short, TensorFlow will only update tf.Variables. If you would use something like tf.Variable(...,trainable=False), you would not get any updates, regardless of what "network depends on". You would have still specified it, and the network would still propagate through that part, but you would never receive any updates for that specific variable.

Cory's answer is correct in the way that the network does automatically recognize what values to update it with, but you specify what has to be defined/updated first!

score 2 · Answer 2 · answered Aug 09 '18 at 00:28

2

TensorFlow works on the premise of something called a computational graph. Essentially, whenever you say something like:

hidden_out = tf.add(tf.matmul(x, W1), b1)

TensorFlow says ok, so that output clearly depends on W1, I'll connect an edge from "hidden_out" to W1. This same process happens for y_, y_clipped, and cross_entropy. So in the end you have a graph which connects cross_entropy to W1. Pick your favorite graph traversal algorithm and TensorFlow finds the connection between cross entropy and W1.

answered Aug 09 '18 at 00:28

Cory Nezin

1,551
10
22

Only partially correct. In fact, the variables do get initialized by the user, and therefore it's not just some "black magic" by the network. See my answer for more details. – dennlinger Aug 09 '18 at 06:05
@dennlinger: this answer gives a missing piece though, because in order to be able to optimize a variable you need to know exactly how the variable influences the loss function. In other words: knowing which variables to optimize is not enough, you also need to know how they influence the result you are optimizing. – henon Mar 06 '19 at 07:19

How does TensorFlow know which variables to change for optimization?

2 Answers2