1

I implemented a network with TensorFlow and created the model doing the following in my code:

def multilayer_perceptron(x, weights, biases):
    layer_1 = tf.add(tf.matmul(x, weights["h1"]), biases["b1"])
    layer_1 = tf.nn.relu(layer_1)
    out_layer = tf.add(tf.matmul(layer_1, weights["out"]), biases["out"])
    return out_layer

I initialize the weights and the biases doing:

weights = {
    "h": tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    "out": tf.Variable(tf.random_normal([n_hidden_1, n_classes]))
    }

biases = {
    "b": tf.Variable(tf.random_normal([n_hidden_1])),
    "out": tf.Variable(tf.random_normal([n_classes]))
    }

Now I want to use a custom activation function. Therefore I replaced tf.nn.relu(layer_1) with a custom activation function custom_sigmoid(layer_1) which is defined as:

def custom_sigmoid(x):
    beta = tf.Variable(tf.random.normal(x.get_shape[1]))
    return tf.sigmoid(beta*x)

Where beta is a trainable parameter. I realized that this can not work since I don't know how to implement the derivative such that TensorFlow can use it.

Question: How can I use a custom activation function in TensorFlow? I would really appreciate any help.

Gilfoyle
  • 3,282
  • 3
  • 47
  • 83

2 Answers2

4

I try to answer my own question. Here is what I did and what seems to work:

First I define a custom activation function:

def custom_sigmoid(x, beta_weights):
    return tf.sigmoid(beta_weights*x)

Then I create weights for the activation function:

beta_weights = {
    "beta1": tf.Variable(tf.random_normal([n_hidden_1]))
    }

Finally I add beta_weights to my model function and replace the activation function in multilayer_perceptron():

def multilayer_perceptron(x, weights, biases, beta_weights):
    layer_1 = tf.add(tf.matmul(x, weights["h1"]), biases["b1"])
    #layer_1 = tf.nn.relu(layer_1) # Old
    layer_1 = custom_sigmoid(x, beta_weights["beta1"]) # New
    out_layer = tf.add(tf.matmul(layer_1, weights["out"]), biases["out"])
    return out_layer
Gilfoyle
  • 3,282
  • 3
  • 47
  • 83
2

That's the beauty of automatic differentiation! You don't need to know how to compute the derivative of your function as long as you use all tensorflow constructs that are inherently differentiable (there are some functions that simply are non-differentiable functions in tensorflow).

For everything else the derivative is computed for you by tensorflow, any combination of operations that are inherently differentiable can be used and you never need to think about the gradient. Validate it by using tf.graidents in a test case to show that tensorflow is computing the gradient with respect to your cost function.

Here's a really nice explanation of automatic differentiation for the curious:

https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/

You can make sure that beta is a trainable parameter by checking that it exists in the collection tf.GraphKeys.TRAINABLE_VARIABLES, this means that the optimizer will compute its derivative w.r.t. the cost and update it (if it's not in that collection you should investigate).

David Parks
  • 30,789
  • 47
  • 185
  • 328
  • so is my approach right? Can I just use it as shown in my question? – Gilfoyle Apr 19 '18 at 15:14
  • 1
    Don't you love it when that happens? Yes. You can do a simple test case using `tf.gradients` to prove it to yourself. Set up a very simple example using a `tf.constant` as input, then run `tf.gradients(beta, cost)` and you'll get the derivative that's used to update beta. If something is wrong you'll get None or an error. – David Parks Apr 19 '18 at 15:16
  • 1
    Yeah, I really do! I am going to answer my own question with a working example. – Gilfoyle Apr 20 '18 at 11:30