Not sure if still relevant, but I'll try anyway.
Generally, what I do in a situation like that is the following:
Populate the (default) graph according to the model you are building, e.g. for the first training step just create the first convolutional layer W1
you mention. When you train the first layer you can store the saved model once training is finished, then reload it and add the ops required for the second layer W2
. Or you can just build the whole graph for W1
from scratch again directly in the code and then add the ops for W2
.
If you are using the restore mechanism provided by Tensorflow, you will have the advantage that the weights for W1
are already the pre-trained ones. If you don't use the restore mechanism, you will have to set the W1
weights manually, e.g. by doing something shown in the snippet further below.
- Then when you set up the training op, you can pass a list of variables as
var_list
to the optimizer which explicitly tells the optimizer which parameters are updated in order to minimize the loss. If this is set to None
(the default), it just uses what it can find in tf.trainable_variables()
which in turn is a collection of all tf.Variables
that are trainable. May be check this answer, too, which basically says the same thing.
- When using the
var_list
argument, graph collections come in handy. E.g. you could create a separate graph collection for every layer you want to train. The collection would contain the trainable variables for each layer and then you could very easily just retrieve the required collection and pass it as the var_list
argument (see example below and/or the remark in the above linked documentation).
How to override the value of a variable: name
is the name of the variable to be overriden, value
is an array of the appropriate size and type and sess
is the session:
variable = tf.get_default_graph().get_tensor_by_name(name)
sess.run(tf.assign(variable, value))
Note that the name
needs an additional :0
in the end, so e.g. if the weights of your layer are called 'weights1'
the name
in the example should be 'weights1:0'
.
To add a tensor to a custom collection: Use something along the following lines:
tf.add_to_collection('layer1_tensors', weights1)
tf.add_to_collection('layer1_tensors', some_other_trainable_variable)
Note that the first line creates the collection because it does not yet exist and the second line adds the given tensor to the existing collection.
How to use the custom collection: Now you can do something like this:
# loss = some tensorflow op computing the loss
var_list = tf.get_collection_ref('layer1_tensors')
optim = tf.train.AdamOptimizer().minimize(loss=loss, var_list=var_list)
You could also use tf.get_collection('layer_tensors')
which would return you a copy of the collection.
Of course, if you don't wanna do any of this, you could just use trainable=False
when creating the graph for all variables you don't want to be trainable as you hinted towards in your question. However, I don't like that option too much, because it requires you to pass in booleans into the functions that populate your graph, which is very easily overlooked and thus error-prone. Also, even if you decide to it like that, you would still have to restore the non-trainable variables manually.