I want to code my own version of the BFGS algorithm with a line search using Tensorflow 2.. API for Neural Network optimization working in Eager and Graph mode. The objective is also to not have a work around using scipy's BFGS or make the interface. Sorry but I don't if I can post the code of my attempts.
Multiple questions/points :
- From this post, there is two Optimizer parent classes : Optimizer and OptimizerV2. Which one should I inherit?
- In the BFGS algorithm, there are multiple variables to be stored (that will change during the optimization process) like the learning rate that will change with the line search, the hessian matrix of the concatenated neural network parameters, the gradients... Should they be initialized/setted normally as attributes? Or using the API (_set_hyper/_get_hyper for OptimizerV2, idk how for Optimizer)?
- Also I dont feel the need to override the build method (if Optimizer is the parent class) nor the create_slots method (if OptimizerV2 if the parent class) since the gradients are approximated.
- Also there are shapes problem since I work with flattened gradients that must be reshaped. For example, such flatten function doesnt work in graph mode since it 'breaks' the graph, meaning an Out Of Scope Error :
# vec is the list of the NN weights
def flatten(vec):
temp = [None] * len(vec)
for i, g in enumerate(vec):
temp[i] = tf.reshape(g, (tf.math.reduce_prod(tf.shape(g)),))
return tf.concat(temp, axis=0)
Same remark, if I write such get_gradients function in order to get the approximated gradients and then use the default apply_gradients method:
def get_gradients(self):
shapes = self.shapes
grads = [None] * len(shapes)
counter = 0
for i, shape in enumerate(shapes):
size = tf.reduce_prod(shape)
grads[i] = tf.reshape(self.current_dirs[counter : counter + size], shape)
counter += size
return grads
The question here is : should I add another attribute shaped_gradients storing reshaped gradients?