5

I am trying to create a rather complex lambda-layer with many operations in keras. After I implemented it, I got a ValueError: No gradients provided for any variable.

While I am using only keras operations to transform the data, (except for a constant I create using numpy which I later add onto a Tensor) I understand that there must be some operations which are not differentiable. Now I want to know how I can figure out which one it is, so I can find a workaround.

I don't want publish any code yet as it is part of a competition and I want to figure this out on my own. If it is difficult to understand my problem because of that, please let me know. I can however give a list of all the functions I am using:

from tensorflow.keras import backend as K
from tensorflow.python.keras.layers import Lambda

...
def my_lambda_function(x):
    # uses:
    K.batch_dot
    K.cast
    K.clip
    K.concatenate
    K.one_hot
    K.reshape
    K.sum
    K.tile  # only applied to a constant created in numpy

...
# using the function in a model like this:
my_lambda_layer = Lambda(my_lambda_function)
result_tensor = my_lambda_layer(some_input)

I think K.one_hot could be problematic, but I want a way to know this for sure before I try making it differentiable

McLP
  • 140
  • 14
  • Not enough info to give a proper answer. The fact that you use `one_hot` implies that you are working with discrete data though, and that is usually not differentiable. – xdurch0 Sep 10 '19 at 08:04
  • I am creating a layer that applies a kind of resorting to a vector based on a the output vector of the last layer for positions. Therefore I need one_hot to map the output vector to a matrix to multiply it with the other. An approximation would be fine though, as of course the discretizing kills the gradients. I thought maybe tensorflow might approximate gradients for those operations. – McLP Sep 10 '19 at 12:45

1 Answers1

2

After a few hours of sleep, here is my simple solution: Create a simple NN for testing and add a lambda layer in which I try out all functions seperately. This is however only an indirect way of finding the problem. Here is my code:

from tensorflow.python.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Lambda
from tensorflow.python.keras.models import Sequential
from tensorflow.keras.datasets.mnist import load_data

from tensorflow.keras import backend as K

import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train, x_test = np.reshape(x_train, (-1, 28, 28, 1)), np.reshape(x_test, (-1, 28, 28, 1))


def test_function(x):
    x_int = K.cast(x, tf.int16)  # this was one of the gradient killers in my case
    return K.cast(x_int, tf.float16)


model = Sequential()
model.add(Input(shape=(28, 28, 1)))
model.add(Conv2D(10, (5, 5), padding='same', activation='relu'))
model.add(MaxPooling2D())
model.add(Lambda(test_function))
model.add(UpSampling2D())
model.add(Conv2DTranspose(4, (5, 5), padding='same', activation='relu'))
model.add(Conv2DTranspose(1, (3, 3), padding='same', activation='sigmoid'))

model.compile(optimizer='adam',
              loss='mse',
              metrics=['accuracy'])

model.fit(x_train, x_train, epochs=5)
model.evaluate(x_test, x_test)

This worked for me but I hope there are better solutions.

Btw. I can approximate a floor operation (which is also killing the gradients) using these functions:

def a(x):
    two_pi = 2 * math.pi
    two_pi_x = x * two_pi
    sine = K.sin(two_pi_x)
    numerator = sine + two_pi_x
    return numerator / two_pi


def approximated_floor(x):
    x2 = a(a(a(x))) - 0.5
    return x2
McLP
  • 140
  • 14