2

I am feeding cnn features into gpflow model. I am writing the chunks of code from my program here. I am using tape.gradient with Adam optimizer (scheduled lr). My accuracy gets stuck on 47% and surprisingly , my loss still gets reducing. Its very weird. I have debugged the program. CNN features are ok but gp model is not learning .Please can you check the training loop and let me know where am I wrong.

def optimization_step(gp_model: gpflow.models.SVGP, image_data,labels):

 with tf.GradientTape(watch_accessed_variables=False)as tape:
    tape.watch(gp_model.trainable_variables)

    cnn_feat = cnn_model(image_data,training=False)

    cnn_feat=tf.cast(cnn_feat,dtype=default_float())
    labels=tf.cast(labels,dtype=np.int64)

    data=(cnn_feat, labels)

    loss = gp_model.training_loss(data) 

    gp_grads=tape.gradient(loss, gp_model.trainable_variables)

 gp_optimizer.apply_gradients(zip(gp_grads, gp_model.trainable_variables))


 return loss, cnn_feat

the loop for training is

 def simple_training_loop(gp_model: gpflow.models.SVGP, epochs: int = 3, logging_epoch_freq: int = 10):


    total_loss = []
    features=[]


    tf_optimization_step = tf.function(optimization_step, autograph=False)

    for epoch in range(epochs):

       lr.assign(max(args.learning_rate_clip, args.learning_rate * (args.decay_rate ** epoch)))

       data_loader.shuffle_data(args.is_training)

       for b in range(data_loader.n_batches):

            batch_x, batch_y= data_loader.next_batch(b)

            batch_x=tf.convert_to_tensor(batch_x)
            batch_y=tf.convert_to_tensor(batch_y)


            loss,features_CNN=tf_optimization_step(gp_model, batch_x,batch_y)

I am restoring weights for CNN from checkpoints saved during transfer learning.

With more epochs , loss continue to decrease but accuracy starts decreasing as well.

The gp model declaration is as follows

     kernel = gpflow.kernels.Matern32() +  gpflow.kernels.White(variance=0.01) 

     invlink = gpflow.likelihoods.RobustMax(C) 
     likelihood = gpflow.likelihoods.MultiClass(C, invlink=invlink)  

the test Function

       cnn_feat=cnn_model(test_x,training=False)

       cnn_feat = tf.cast(cnn_feat, dtype=default_float())

       mean, var = gp_model.predict_f(cnn_feat)

       preds = np.argmax(mean, 1).reshape(test_labels.shape)
       correct = (preds == test_labels.numpy().astype(int))
       acc = np.average(correct.astype(float)) * 100
irum
  • 41
  • 5
  • Would you mind posting an executable minimal failing example? Also, by "the gp model is not learning" do you mean i) the accuracy of the model is not satisfactory or ii) the parameters of the model are not changing? Thanks. – Vincent Dutordoir May 31 '20 at 08:57
  • To be honest, it’s not possible for me to put code here as it may work for simple examples but the work I am doing is complicated one. I am feeding features from already trained CNN to GP model so it’s not possible to put code here. I can share the results of the output. – irum May 31 '20 at 09:17
  • Accuracy means the model is classifying images up to this much accuracy. I am working on multiclass classification of images. – irum May 31 '20 at 09:18
  • @Vincent Dutordoir sorry, I am a new learner of the gpflow library . Can you please let me know how can I check that the parameters of the model are changing or not. – irum May 31 '20 at 09:21
  • @VincentDutordoir Yes, the parameters are changing but the problem is still there. Can you please just check that whether the training loop is correctly written. – irum May 31 '20 at 18:07

1 Answers1

0

Can you please just check that whether the training loop is correctly written

The training loop looks fine. However, there are bits that should be modified for clarity and for optimisation sake.

def simple_training_loop(gp_model: gpflow.models.SVGP, epochs: int = 3, logging_epoch_freq: int = 10):
    total_loss = []
    features=[]

    @tf.function
    def compute_cnn_feat(x: tf.Tensor) -> tf.Tensor:
        return tf.cast(cnn_model(x, training=False), dtype=default_float())

    @tf.function
    def optimization_step(cnn_feat: tf.Tensor, labels: tf.Tensor):  # **Change 1.**
        with tf.GradientTape(watch_accessed_variables=False) as tape:
            tape.watch(gp_model.trainable_variables)
            data = (cnn_feat, labels)
            loss = gp_model.training_loss(data) 
        gp_grads = tape.gradient(loss, gp_model.trainable_variables)  # **Change 2.**
        gp_optimizer.apply_gradients(zip(gp_grads, gp_model.trainable_variables))
        return loss

    for epoch in range(epochs):
       lr.assign(max(args.learning_rate_clip, args.learning_rate * (args.decay_rate ** epoch)))
       data_loader.shuffle_data(args.is_training)
       for b in range(data_loader.n_batches):
            batch_x, batch_y= data_loader.next_batch(b)
            batch_x = tf.convert_to_tensor(batch_x)
            batch_y = tf.convert_to_tensor(batch_y, dtype=default_float())
            cnn_feat = compute_cnn_feat(batch_x)  # **Change 3.**
            loss = optimization_step(cnn_feat, batch_y)

Change 1. Signature of a function that you wrap with tf.function should not have mutable objects.

Change 2. The gradient tape will track all computations inside the context manager, including the computation of the gradients i.e. tape.gradient(...). In turn, that means your code performs an unnecessary calculation.

Change 3. For the same reason as in "Change 2." I moved the CNN feature extraction outside of the gradient tape.

Artem Artemev
  • 516
  • 3
  • 8
  • Thank you for your reply and suggestions. I changed the code accordingly but still the results are same . I do not understand why model is not learning. Why it’s behaving very weird. I am using ‘gp_model.predict_f(test_data)’to calculate accuracy. Ah! Stuck on this for long now. It should be very straight forward. – irum Jun 01 '20 at 14:34
  • I am using svgp model. – irum Jun 01 '20 at 15:58
  • Are you trying to implement GPDNN from https://arxiv.org/pdf/1707.02476.pdf? Honestly, I don't believe there is an error in the predict method. However, there might be an error in how you compute the accuracy or logging of the loss metric. In this situation where you cannot provide details of your setup and MFE, it is very hard to tell you more and give a strict answer to your question. – Artem Artemev Jun 01 '20 at 21:32
  • I have edited the question by adding accuracy function and gp model declaration . – irum Jun 01 '20 at 23:31
  • Yes , I am producing the above mentioned paper . I am facing many issues. E,g if I add more layers on the top of cnn model as transfer learning then gp model does not learn and give only 10% accuracy. – irum Jun 11 '20 at 00:27